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Alexandria, VA 22313-1450 

Dear Sir: 

I, Dr. Victoria Smith, declare and state as follows: 
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1. I am a Senior Scientist in the Department of Molecular Biology of Genentech, 
Inc., 1 DNA Way, South San Francisco, CA 94080. 

2. My scientific Curriculum Vitae, including my list of publications, is attached to 
and forms part of this Declaration (Exhibit A). 

3. I joined Genentech in 1996. For approximately three years, I directed a laboratory 
in the Department of Molecular Biology. During this time I was involved in target discovery for 
the Tumor Antigen Project, using DNA microarrays to discover genes differentially expressed in 
tumors compared to their expression in normal tissues. In connection with the above-identified 
patent application, I directed the generation and analysis of the microarray data attached as 
Exhibit B. 

4. Exhibit B reports the results of the microarray analysis conducted on the gene 
encoding PRO 1800 (DNA35672) as part of the investigation of several newly discovered DNA 
sequences. The column "Unq Id" identify the gene as 851, which is DNA35672, while the 
column "DNA Id" identifies the particular lot of PCR product used. The microarray experiments 
were performed using well-established and accepted microarray techniques known in the art. 
(See, e.g., Nature Revs. Genetics, 5:229-237 (2004), attached as Exhibit C). The DNA samples 
used in the microarray studies were obtained from individual lung tumor tissue samples or 
individual normal lung tissue samples. The individual tumor and normal lung samples were each 
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compared to pooled samples of normal epithelial tissue. The level of expression in the lung 
tumor or normal lung tissue sample was compared to the normal pooled epithelial sample, and 
reported as a raw ratio. The average of the normal lung samples was then used to normalize the 
data to generate a ratio of expression of the PRO 1800 gene in lung tumor samples compared to 
the average expression in normal lung tissue. In the results reported in Exhibit B, a ratio of 2.0 
or greater is a significant result, and indicates a significant increase in expression of the 
PRO 1800 gene in lung tumor tissue compared to the normal lung tissue controls. 

5. The results of the microarray studies reported in Exhibit B indicate that the gene 
encoding PRO 1800 (DNA35672) is significantly overexpressed in nine of the eighty lung tumor 
samples tested compared to the normal lung tissue controls. That is the equivalent of one in 
every nine samples. In contrast, none of the individual normal lung tissue samples show 
significant overexpression of the PRO 1800 gene. In addition, the average ratio of the lung tumor 
samples is significantly different from the average ratio of the individual normal lung tumor 
samples (p < 0.01). 

6. It is well-established in the art that overexpression of the mRNA for a gene is 
likely to lead to overexpression of the corresponding protein. Support for this statement can be 
found, for example, in the Molecular Biology of the Cell, a leading textbook in the field. (Bruce 
Alberts, et aL, Molecular Biology of the Cell (4 th ed. 2002), excerpts submitted herewith as 
Exhibit D). Figure 6-3 on page 302 illustrates the basic principle that there is a correlation 
between increased gene expression and increased protein expression. The accompanying text 
states that "a cell can change (or regulate) the expression of each of its genes according to the 
needs of the moment - most obviously by controlling the production of its mRNA," Molecular 
Biology of the Cell at 302, emphasis added. Similarly, figure 6-90 on page 364 illustrates the 
path from gene to protein. The accompanying text states that while potentially each step can be 
regulated by the cell, "the initiation of transcription is the most common point for a cell to 
regulate the expression of each of its genes." Molecular Biology of the Cell at 364. This point is 
repeated on page 379, where the authors state that of all the possible points for regulating protein 
expression, "[f]or most genes transcriptional controls are paramount." Molecular Biology of the 
Cell at 379. 

7. While not every lung tumor sample tested shows overexpression of the PRO1800 
gene, the data in Exhibit B indicate that a significant portion of lung tumors do (one in every 
nine), while none of the normal lung tissue samples show overexpression. Given the known 
correlation between overexpression of a gene and the corresponding overexpression of the 
encoded protein, it is very likely that a similar number of lung tumors will overexpress the 
PRO 1800 protein, while normal lung tissue samples will not. Together with the data reported in 
Example 16 that the gene encoding PRO 1800 is amplified in some lung tumors, the results 
reported in Exhibit B indicate that the PRO 1800 gene and protein, as well as antibodies to the 
encoded protein, can be used to differentiate some cancerous lung tissue from normal lung tissue. 
Because not all lung tumors show overexpression of PRO 1800, it cannot be used to exclude a 
sample being tested as non-cancerous. However, the PRO 1800 gene, protein, and corresponding 
antibodies are useful as a diagnostic tool since a significant portion of lung tumors overexpress 
the gene and most likely the encoded protein, while no normal lung samples do. 
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I hereby declare that all statements made herein of my own knowledge are true 



and that all statements made on information or belief are believed to be true, and further that 
these statements were made with the knowledge that willful false statements and the like so made 
are punishable by fine or imprisonment or both, under Section 1001 of Title 18 of the United 
States Code and that such willful statements may jeopardize the validity of the application or any 
patent issued thereon. 



By: 




Date: 




6± 



Victoria Smith, Ph.D. 
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VICTORIA SMITH 



Genentech Inc. 
Dept Molecular Biology 
lDNAWay 
South San Francisco CA 94080 
Ph: (650) 225 7382 
Fax:(650)2256497 
Email: victoriarSteene.com 

Education 

Ph.D. (1991) Molecular Biology, University of Cambridge, Cambridge, United 
Kingdom. 

Honors (1987) Biochemistry, University of Western Australia, Australia. 
Bachelor of Science (1986) Physical and Inorganic Chemistry, and Biochemistry, 
University of Western Australia, Australia. 

Work and Research Experience 

Senior Scientist, Genentech Inc (August 1996 - present, promoted to Senior Scientist 
March 2001) 

Lab head, Dept Molecular Biology 

- Identification of potential therapeutic targets for cancer using novel microarray 
technology 

- Discovery and identification of novel secreted proteins 

- Development of cancer therapuetics 

Stanford University, California, U.S.A. (February 1995 - August 1996) 
Research Fellow, Department of Biochemistry 

Research: Genomic functional analysis of chromosome V of Saccharomyces cerevisiae 

Stanford University, California, U.S.A. flanuary 1992 - January 1995) 
Postdoctoral Fellow, Department of Genetics. 
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Research: Development of new methodology for whole genome functional analysis in 
microorganisms using genomic sequence data and insertional mutagenesis. 
Awards 

Human Frontiers Science Program Organization Long Term Fellowship (accepted. 
4/01/93-1/31/95). 

American Cancer Society (California Division) Fellowship (1993, declined). 

Cambridge University, United Kingdom (October 1988 - December 1991) 
Research undertaken at the Medical Research Council Laboratory of Molecular Biology, 
Cambridge, UK, for the degree of Doctor of Philosophy, Cambridge University. 
Thesis: A Molecular Genetic Analysis of Yeast Chromosome IX. 
Thesis Advisor: Dr. Barclay Barrell. 

Awards 

Max Perutz Prize in 1991 for outstanding performance by a graduate student. Awarded 
for advances in genomic-scale DN A sequencing methodology, and genetic analysis of 
the SNP1 gene of Saccharomyces cerevisiae . 

King Edward Memorial Hospital for Women, Western Australia (1988) 
Research Scientist. 

Research: Analysis of hormone-inducible mRNAs in breast tumors. 
University of Western Australia (1984 - 1987) 

1984 - 1986: Bachelor of Science degree with double major in Biochemistry and Physical 
and Inorganic Chemistry. 

1987: First Class Honors in Biochemistry. Thesis: Nuclease Sensitivity and Methylation 
Patterns of the Phenylalanine Hydroxylase Gene. 

Awards 

Association of Commonwealth Universities Scholarship for study in the United 
Kingdom (accepted, October 1988 - October 1991, University of Cambridge). 
Hackett Scholarship for overseas study (1988, declined). 

Lady James Prize in Natural Science in 1986: Best student completing Bachelor of 
Science degree with a major in a natural science. 

JWH Lugg Prize in Biochemistry in 1986: Best student completing major in 
Biochemistry. 
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Convocation Prize in Science in 1985: Best student completing major in second 
Physics, Geology or Chemistry. 

Shell Prize in Chemistry in 1985: Best student completing major in second year 
Chemistry. 



Publications 

V - Smit *V RR Shen/ D- Wieand, T.H. Landon, N. A. Wong, A.M. Lessells, S. Paterson- 
Brown, T.D. Wu, J.Z. Tang, KJ. Hiflan, ID. Penman, Expression Analysis of the 
Metaplasia-Dysplasia-Carcinoma Sequence in Barrett's Esophagus (submitted). 

Tice DA. Szeto W. Soloviev I. Rubinf eld B. Fong SE. Dugger DL. Winer J. Williams PM. 
Wieand D. Smith V. Schwall RH. Pennica D. Polakis P. Journal of Biological Chemistry 
277(16):14329-35, 2002 Apr 19. 

N.J. Maughan, F. Lewis, V.Smith (2001) Journal of Pathology 195, 3-6. 

F. Lewis, N.J. Maughan, V.Smith, K. Hillan, P. Quirke (2001) Journal of Pathology 195, 66- 
71. 

D.J. Garfinkel, M.J. Curcio and V. Smith (1998) Ty Mutagenesis Methods in Microbiology, 
volume 26, 101-117. 

C Churcher, S. Bowman, K. Badcock, A. Bankier, D. Brown, T. Chilungworth, R. 
Connor, K Devlin, S. Gentles, N. Hamlin, D. Harris, T. Horsnell, S. Hunt, K. Jagels, M. 
Jones, G. Lye, S. Moule, C Odell, D. Pearson, M. Rajandream, P. Rice, N. Rowley, J. 
Skelton, V.Smith, S. Walsh, S. Whitehead & B. BarrelL (1997) Nature, 387, 84-87. 

F. S. Dietrich, J. Mulligan, K. Hennessy, M. A. Yelton, E. Allen, R. Araujo, E. Aviles, A. 
Berno, T. Brennan, J. Carpenter, E. Chen, J. M. Cherry, E. Chung, M. Duncan, E. 
Guzman, G. Hartzell, S. Hunicke-Smith, R. W. Hyman, A. Kayser, C Komp, D. 
Lashkari, H. Lew, D. Lin, D. Mosedale, K Nakahara, A. Namath, R. Norgren, P. Oefner, 
C. Oh, FX. Petel D. Roberts, P. Sehl, S. Schramm, T. Shogren, V. Smith. P. Taylor, Y. 
Wei, D. Botstein & R. W. Davis. (1997) Nature 387, 78-81. 
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V. Smith, K. Chou, D. Lashkari, D. Botstein, and P. O. Brown. (1996). Functional 
Analysis of the Genes of Yeast Chromosome V by Genetic Footprinting. Science 274, 
2069-74 

V. Smith, D. Botstein, and P. O. Brown (1995) Genetic Footprinting: A genomic strategy 
for determining a gene's function given its sequence. Proc Natl Acad. Sci. U.SA., 92, 
6479-6483. 

V. Smith, M. Craxton, A. T. Bankier, C. M. Brown, W. D. Rawlinson, M. Chee, and B. G. 
Barrell (1995) Microtiter methods for the preparation and fluorescent sequencing of M13 
clones. Recombinant DNA Methodology II: a volume in (he Selected Methods in Enzymology 
Series, pp. 607 - 621. 

V.Smith, M. Craxton, A. T. Bankier, C. M. Brown, W. D. Rawlinson, M. S. Chee, and B. 
G. Barrell (1993) Microtiter methods for the preparation and fluorescent sequencing of 
M13 clones. Methods in Enzymology, 218 , 173-187. 

V.Smith and M. S. Chee (1991) A simple method for sequencing the complementary 
strand of ssDNA from M13 clones. Nucleic Acids Research 19, 6957. 

V- Smith and B. G. Barrell (1991) Cloning of a Yeast Ul snRNP 70K protein homologue: 
functional conservation of an RNA binding domain between humans and yeast. EMBO 
Journal 10, 2627-2643. 

W. D. Rawlinson, M. S. Chee, V.Smith and B. G. Barrell (1991) Preparation of large 
numbers of single stranded DNA templates by rescue from phagemids in microtiter 
plates. Nucleic Acids Research 19, 4779. 

V.Smith. C. M. Brown, A. T. Bankier, and B. G. Barrell (1990) Semi-automated 
preparation of DNA templates for large scale sequencing projects. DNA Sequence 1, 73- 
78. 

S. J. Wysocki, E. Hahnel, S. P. Wilkinson, V. Smith, and R. Hahnel (1990) Hormone- 
sensitive gene expression in breast tumors. Anticancer Research 10, 185-188 
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S. J. Wysocki, E. Hahnel, A. Masters, V. Smith. A. J. McCartney and R. Hahnel (1990) 
Detection of pS2 messenger RNA in gynecological cancers. Cancer Research 50, 1800- 
1802 



Patents 

Genetic Footprinting: Insertional Mutagenesis and Genetic Selection. US. Patent No. 
5,612,180. Inventors: Patrick Brown and Victoria Smith 

PATENTS FILED ( at Genentech Lie) 

Methods of Detecting and Quantifying Gene Expression. Inventors: Victoria Smith, 
Edward Robbie, David Lowe, James Marsters. 

Compositions and Methods for the Treatment of Cancer. Inventors: Victoria Smith, 
Austin Gurney, Audrey Goddard, Fred DeSauvage 

Diagnostic for Dysplasia in Barrett's Esophagus. Inventor: Victoria Smith 

Numerous composition of matter filings related to novel gene discovery, pending 
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Unq Id DNA id Experiment Name 

Lung Tumor Samples 

851. 157,736. 100ngLungBaCa1069 vs 25ngEpi1409 

851 . 1 57,736. lung 1 431 vs epi pool 

851. 157,736. lung SqCa R694 vs epi pool 

851. 157,736. Lung SqCa-hf 1649 

851 . 1 57,736. Lung SqCa-hf 1 649 

851 . 1 57,736. lung tumor 1 055 vs epi pool 

851. 157,736. lung tumor 10ng 

851 . 1 57,736. lung tumor 1 370 vs epi pool 

851 . 135,920. lung tumor 1 647 

851 . 1 35,920. lung tumor 1 648 

851. 157,736. lung tumor 1685/ref.RNA 

851 . 1 35,920. Lung Tumor 685 

851 . 1 57,736. lung tumor 688 vs epi pool 

851. 135,920. lung tumor 734 

851 . 1 35,920. lung tumor 735 

851 . 1 35,920. lung tumor 737 

851 . 1 35,920. lung tumor 738 

851 . 1 35,920. lung tumor 739 

851 . 1 01 ,241 . Lung tumor hf 842 

851 . 1 35,920. Lung Tumor HF-001 7083 

851 . 1 35,920. lung tumor hf-1 291 

851 . 1 57,736. lung tumor hf-1 333 

851 . 1 57,736. lung tumor hf-1 340 

851. 157,736. lung tumor hf-1 364 

851 . 1 57,736. lung tumor hf-1 366 

851. 157,736. lung tumor hf-1 587 

851. 157,736. lung tumor hf-1 587 v normal 

851. 157,736. lung tumor hf-1587 v. normal 

851. 135,920. lung tumor hf-1 596 

851. 135,920. lung tumor hf-1 646 

851. 135,920. lung tumor hf-1 649 

851 . 1 35,920. lung tumor hf-1 655 

851. 135,920. lung tumor hf-1 71 9 

851. 157,736. lung tumor hf-1 775 

851 . 157,736. lung tumor hf-1785 

851. 135,920. Lung Tumor HF1602 

851. 135,920. Lung Tumor HF1 651 

851. 135,920. Lung Tumor HF1729 

851. 101,241. Lung Tumor HF631 

851. 101,241. Lung tumor hf840 

851. 157,736. lung tumor R1057 vs epi pool 

851. 1 57,736. lung tumor R 1 094 vs epi pool 

851. 157,736. lung tumor R1094 vs epi pool 

851. 157,736. lung tumor R1372 vs epi pool 

851. 157,736. lung tumor R1372 vs epi pool 

851. 157,736. lung tumor R417 vs epi pool 



Raw Ratio Normalized Ratio 
(sample/pooled (sample/normal 
epithelial) lung) 
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851. 


157,736. lung tumor R542 vs epi pool 


0.454 


0.798 


851. 


1 57,736. lung tumor R543 vs epi pool 


0.651 
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157,736. lung tumor R544 vs epi pool 
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157,736. lung tumor R544 vs epi pool 
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1 57,736. lung tumor R693 vs epi pool 
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851. 157,736. 1415LunglnflamTA1 0.706 1.241 

851. 157,736. 1415LunglnflamTA1 0.494 0.869 



PERSPECTIVES 



GUIDELINES 



Expression profiling — best practices 
for data generation and interpretation 
in clinical trials 

The Tumor Analysis Best Practices Working Group* 



Microarrays are routinely used to assess 
mRNA transcript levels on a genome-wide 
scale. As use and acceptance increases, 
there is intensified focus on appropriate 
methods of data generation and 
interpretation, with important questions 
being asked about the best data analysis 
methods. The development of such 'best 
practices' is needed, as microarrays — in 
particular, Affymetrix oligonucleotide arrays 
— are becoming increasingly important 
in human clinical trials, both for 
differential diagnosis and monitoring 
of pharmacological efficacy. Here, 
representatives from high-volume 
microarray core centres consider the 
current status of 'best practices', 
focusing on the broadly used Affymetrix 
oligonucleotide arrays. 

Microarrays represent a major technological 
advance in molecular biology. The introduc- 
tion of any such advance is typically followed 
by a period of optimization and standardiza- 
tion The latter is a crucial part of any maturing 
technology, as it allows an approach in which 
advances are made in parallel by individual 
researchers and companies who contribute 
new knowledge based on the existing stan- 
dard. Any such standards must be constandy 
reassessed; stale or stagnant standards can 
inhibit the development of the technology. 

Microarray-based mRNA-expression pro- 
filing can be considered to be the first mature 
genome-wide analysis technology, reflected in 
an increased interest in using microarrays as 
an endpoint in clinical trials. However, regula- 
tions of clinical trials require the development 
of clear standards for use and interpretation of 
microarray data (commonly referred to as 
quality control and standard operating proce- 
dures (QC/SOPs) and/or 'best practices'). 
Guidelines for reporting and annotation of 
microarray data from the Microarray Gene 
Expression Data (MGED) Society (see online 
links box) — using MIAME (Minimum 
Information About A Microarray Experiment) 
standards (box l) and the MAGE- ML mark-up 



language 1 ' 2 — represent an important step 
towards this goal. The efforts of this multina- 
tional academic-industry partnership has 
made it possible to develop databases that 
can house the many types of microarray data 
(see below) within the same data struc- 
ture, enabling some data queries between 
experiments and experimental platforms. The 
ArrayExpress microarray database 3 (see online 
links box) is the first major publicly accessible 
database that adheres to this universal data- 
presentation platform, and some prominent 
journals (such as Nature, Cell EMBO Journal 
and Vie Lancet) now demand that published 
microarray data conform to the MIAME 
standards. In addition, microarray manufac- 
turers, such as Affymetrix, have implemented 
MIAME-compliant data output in their new 
software releases. 

The MGED Society has effectively devel- 
oped data-reporting guidelines, but it has 
not addressed issues of data generation and 
interpretation. The latter are more mtimately 
coupled to the specific experimental plat- 
form. Of the three commonly used types of 
microarrays (spotted cDNA, spotted oligonu- 
cleotide and Affymetrix arrays), each has dis- 
tinct methodologies associated with them; 
accordingly, the issues of data interpretation 
are also different (BOX 2). These differences 
make it difficult or impossible to develop 
cross-platform guidelines for data generation 
and interpretation. Best practices for spot- 
ted cDNA arrays are especially problematic 
because the manufacture of the arrays varies 
considerably from place to place. In addition, 
all spotted arrays use co-hybridization of a 
test RNA sample labelled with one colour 
fluorophore with a control RNA labelled with a 
different colour to which the test is compared 
on the same spot. The output is in the form of 
a ratio of hybridization signals that is compa- 
rable to other experiments only if the same 
control RNA is always used. Therefore, the 
development of standards in spotted arrays 
would require all laboratories to use the same 
control RNA solution before data could be 
easily compared. 



Manufactured oligonucleotide arrays 
(both mechanically spotted and synthesized 
in situ) have the advantages of being centrally 
produced under controlled conditions. 
Affymetrix PHOiourHOGRAPHY-produced arrays 
have been available for nearly 10 years, 
whereas mechanically spotted oligonucleotide 
arrays have only very recently begun to 
appear in the marketplace. For example, 
Agilent Technologies (see online links box) 
recently released 17,000 60-mer oligonu- 
cleotides printed five times each on glass 
slides (85,000 features). Spotted oligonu- 
cleotide arrays typically have a single spot per 
gene (single probe measurement), whereas 
Affymetrix arrays provide multiple measure- 
ments — a series of independent or semi- 
independent oligonucleotides query each 
RNA in solution (the probe set) (box 2). 
Affymetrix probe sets are constructed from a 
series of perfect-match and paired-mismatch 
oligonucleotides, allowing some assessment 
of non-specific binding and performance of 
the probes. Overall, the Affymetrix probe sets 
provide a variety of measurements that allow 
robust measures of gene expression. The use 
of multiple perfect-match and mismatch 
probes for each gene enables the development 
of different methods of interpreting the 
hybridization patterns across the probe set 
and calculating a single 'expression level* or 
'signal' that reflect the gene's relative expres- 
sion level. A number of probe-set interpreta- 
tion algorithms for Affymetrix arrays are 
available (see below for discussion). 



"[distinct methodologies] 
make it difficult or 
impossible to develop 
cross-platform guidelines 
for data generation and 
interpretation" 



The increasing use of Affymetrix micro- 
arrays, and the emergence of this technology 
as an endpoint in clinical trials, has led to 
requests to develop, in both the pharmaceuti- 
cal and academic research communities, best 
practices in data generation and analysis. 
Given the many differences between spotted 
cDNA, spotted oligonucleotide and Affy- 
metrix arrays, the best practices need to be 
developed separately for each experimental 
platform; this is in contrast to data reporting 
that can be standardized across all platforms 
(boxes i and 2). The Tumor Analysis Best 
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Box 1 I The MIAME guidelines for data reporting - ■ - 

The Microarray Gene Expression Data Society (MGED) is an international discussion group of ; 
microarray experts, with the primary goal of developing methods for data sharing between 
experimental platforms. The main output of this group has been the Minimum Information 
About A Microarray Experiment (iMI AM E) guidelines for microarray data annotation and 
reporting. The guidelines have been adopted by a number of scientific journals and have recently I 
been endorsed for use by the US Food and Drug Administration and the US Department of 
Agriculture for pharmacogenomics projects. 

The MIAME guidelines include descriptions of experimental design (number of replicates, 
nature of biological variables), samples used, extract preparation and labelling, hybridization : 
procedures and parameters, and measurement data and specifications. These guidelines have been 
most important for the spotted cDNA and oligonucleotide experimental platforms (see BOX 2) in 
which the flexibility in microarray design and utilization also leads to considerable variation in 
array data generation and reporting between different laboratories. The guidelines do not attempt 
to dictate how experiments should be done, but rather provide adequate information associated ! 
with any published or publicly available experiment so that the experiment caiibe reproduced. 



Box 2 I Microarray experimental platforms 

There are three different types of microarray in common use: spotted cDNAs, spotted 
oligonucleotides and Affymetrix arrays. 

Spotted cDNA arrays 

Spotted cDNA arrays typically use sets of plasmids of specific cDNAs in gridded liquid aliquots. 
The inserts of each clone are typically amplified by PCR, and a few picolitres are physically 
spotted onto glass slides by liquid-handling robots. Robotic spotters can spot 100,000 spots per 
slide, and duplicate sets of clones are often spotted. The advantages of spotted cDNA arrays are 
that the content of each microarray is deterrnined by the researcher, with complete flexibility in 
number and type of cDNA clones spotted. Also, the cost per array is relatively low, as the clone 
sets are a PCR- renewable resource and the glass slides are themselves inexpensive. The amount of 
the RNA that corresponds to each spot is determined relative to a second control RNA solution 
that is hybridized to the same spot, and a ratio is obtained. 

Disadvantages of spotted cDNA arrays include the variable amount of DNA spotted in each 
spot, the 10-20% 'drop out' rate of failed PCR reactions or failed spots and mis-identification of 
clones (that is, the spot is not what you think it is). Also, there is no control over the actual 
sequence of the clone. As many gene-coding sequences contain regions of sequence that are 
shared with other genes, there are questions of specificity of the hybridization to the relatively 
large cDNA inserts. Spotted cDNA arrays were embraced by most academic centres, owing to 
their flexibility and relatively low cost 

Spotted oligonucleotide arrays 

These arrays are also built by liquid handling on glass slides; however, the input solution is a 
synthetic oligonucleotide (often 60-70-mers). The resulting spotted material is typically of 
known concentration, of known sequence and is single stranded (all advantages relative to 
spotted cDNAs). Most of the process can be automated, leading to less sample mix-up and less 
drop-out of samples. 

Disadvantages of spotted oligonucleotides include the relatively high cost of synthesizing large 
numbers of large oligonucleotides and the non- renewable nature of the resource. Spotted 
oligonucleotide arrays are becoming increasingly available. 

Affymetrix GeneChips 

These microarrays are factory designed and synthesized. Design is done using software to 
choose a series of 11 25-mer probes from the 3' end of each transcript or predicted transcript 
in the genome; each of the 1 1 probes is then paired with a similar mismatch probe that is designed 
to contain a mutation in the centre. The latter serves as a form of control for hybridization 
specificity. Synthesis of arrays is done using light-activated chemistry and photolithography 
methods, and feature size can be reduced to approximately 8 urn 2 , with about 1 million probes in a 
1.2 cm 2 glass area. Probe-set algorithms interpret the signals from each 22-oligonudeoude probe 
set, and derive a single value (signal) from the patterns of hybridization to the 22 individual probes. 
This signal is then normalized to the entire microarray, or to the probe sets across an entire project 

For a more general discussion of normalization and analysis methods of different microarray 
platforms, the reader is referred to the excellent web information resource of the MGED group 
(see The MGED Data Transformation and Normalization Working Group in online links box). 



Practices Working Group (see box 3) was 
convened to discuss and develop best prac- 
tices for Affymetrix microarrays, including 
QC and SOPs for both data generation and 
data analyses. The first meeting was held in 
Santa Clara in March 2003, followed by a 
series of conference calls that focused on dis- 
cussions of data generation and analysis stan- 
dards for the Affymetrix oligonucleotide 
arrays. The Working Group deliberately 
focused on a platform that has widespread 
usage and is most likely to be used in clinical 
trials owing to the previously standardized 
manufacturing process. Here, we discuss rec- 
ommendations for experimental design, 
probe- set analysis algorithms, signal/noise 
assessments and biostatistical methods. 

Experimental design 

Appropriate experimental design is a key 
aspect of all science, and microarray studies 
are no exception. The relatively high cost of 
some commercial microarray platforms is a 
frequendy cited reason for suboptimal experi- 
mental design, especially with regards to the 
number of replicates. Data interpretation is 
inevitably compromised when replicates are 
decreased. 

Replication in cross- sectional studies. The 
appropriate number of microarray replicates 
for any particular condition or time point 
depends on the source of biological variability 
in the study samples, inter-individual variabil- 
ity is very large in outbred (genetically hetero- 
geneous) humans, but is very small within 
inbred mouse strains. For example, expres- 
sion profiles derived from muscle from differ- 
ent mice are not more variable than from 
muscles isolated from one mouse 4 . Defining 
the confounding variables that contribute to 
experimental variability, such as intra-subject, 
inter-subject, inter-group and technical vari- 
ation (microarray protocol), is needed to 
design and statistically power a study, and 
to determine the number of replicates that are 
needed. In general, inbred mice require test- 
ing only three or four mice per group. We and 
others have found that five or six out-bred 
rats per group provide statistically robust 
results 5,6 . By contrast, human samples require 
considerably more individuals per group. Key 
variables in human samples include tissue 
heterogeneity, stage of disease and inter- 
individual variation, all of which have been 
found to be major confounding variables 7 . 

Replication in longitudinal studies. It has long 
been recognized that, in human clinical trials, 
longitudinal designs provide considerably 
greater power at lower numbers of replicates. 
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Box 3 1 The Tumor Analysis Best Practices Working Group* 

The Tumor Analysis Best Practices Working Group is a group of investigators who study the best 
practices of tumour analysis in humans taking part in clinical trials. The following authors are 
members of the Group: 

• Eric P. Hoffman is at the Research Center for Genetic Medicine, Children's National Medical 
Center, Washington DC 20010, USA. email: ehofifhian@cnmcresearch.org 

• Tarif Awad, John Palma, Teresa Webster, Earl Hubbell and Janet A. Warrington are at Affymetrix, 
Santa Clara, California 95051, USA. emails: tarif_awad@affymetrix.com; 
john_palma@affymetrix.com; teresa_webster@affymetrix.com; earl_hubbell@affymetrix.com; 
janet_warrington@affymetrix.com 

• Avrum Spirais at The Pulmonary Center, Boston University Medical Center and the 
Bioinformatics Program, Boston University, Boston, Massachusetts 021 18, USA. e-mail: 
aspira@lung.bumc.bu.edu 

• George Wright is at theBiometric Research Branch, Division of Cancer Treatment and 
Diagnosis, National Cancer Institute, National Institute of Health, Bethesda, Maryland 20892, 
USA. e-mail: wrightge@mail.nih.gov 

• Jonathan Buckley and Tim Triche are at the Children's Hospital, University of California, Los 
Angeles, California 90089, USA. e-mail: buckley@hsc.usc.edu; triche@hsc.usc.edu 

• Ron Davis, Robert Tibshirani and Wenzhong Xiao are at Stanford University, Palo Alto, 
California 94303, USA. e-mails: dbowe@stanford.edu; tibs@statstanford.edu; 
wzxiao@pmgm2.stanford.edu 

• Wendell Jones is at Expression Analysis Inc., Durham, North Carolina 27713, USA. e-mail: 
wjones@expressionanaIysis.com 

• Ron Tompkins is at Harvard University, Boston, Massachusetts 021 15, USA. e-mail: 
rtompkuis@partners.org 

• Mike West is at the Institute of Statistics and Decision Sciences, Duke University, Durham, 
North Carolina 27708, USA. e-mail: mw@statduke.edu 



They best control for inter- individual varia- 
tion because each subject serves as their own 
control. Serial blood sampling from single 
subjects is the least invasive 8 (see below for 
further discussion), and, for example, cancer 
patients are often longitudinally sampled 9 . 
Serial biopsies of other tissues are more inva- 
sive: however, a number of serial human mus- 
cle biopsy studies of healthy volunteers after 
different types of exercise training have begun 
to appear in the literature 1011 . 

Expression profiling of blood samples 
(longitudinal or cross-sectional design) is the 
protocol that is most likely to be used in 
human clinical trials. One of the Working 
Group's goals was to establish SOPs for blood 
sample collection and RNA isolation in clini- 
cal trials. A specific follow-up report of these 
recommendations will be published else- 
where. Such a protocol must be easily adapt- 
able to multiple trial sites, with relatively little 
need for resident expertise to carry out the 
isolation protocoL So far, standard methods for 
isolating peripheral-blood mononuclear cells 
have shown the most reproducibility, although 
others are being tested (see Affymetrix Tech- 
nical Note in online links box). Cells isolated 
soon after collection can be flash frozen for 
storage and subsequent RNA isolation or an 
RNA stabilizing compound can be added if 
the samples need to be transported. 

Tissue/cell heterogeneity. Tissue heterogeneity 
is a major confounding variable in most micro- 
array experiments. In inbred mice, tissue het- 
erogeneity is typically normalized by using 
whole organs. This is rarely possible in human 
experiments, and particularly not in clinical 
trials; the limited amount of human tissue 
that is available exacerbates heterogeneity. The 
mixed cell populations of peripheral blood 
can be thought of as a tissue heterogeneity 
problem similar to that encountered in all 
solid tissue and tumour biopsies. Indeed, a 
recent study showed that variation as a result 
of tissue variability in human muscle biopsies 
often exceeded inter-individual variability 12 . 
One potential solution to the tissue hetero- 
geneity problem lies in bioinformatic meth- 
ods. If computer software can be trained to 
recognize the expression profiles of each indi- 
vidual cell type within a mixed tissue sample, 
then it should be possible to subtract them 
from each other and to renormalize to obtain 
a set of cell-specific expression profiles derived 
from a single mixed profile. This will be most 
easily done on tumour biopsies, in which the 
main cells of interest are tumour versus conta- 
minating normal tissue. Although there are no 
published examples so far, such methods are 
maturing rapidly. 



An experimental alternative to mitigate 
confounding tissue heterogeneity is to isolate 
pure cell populations for expression profiling. 
Many such methods are well developed in the 
research laboratory, including fluorescence- 
activated CELL SORTING (FACS) '\ NEGATIVE CELL 

isolations from blood (for example; Stem Ceil 
Technologies RosetteSep) 14 and laser capture 
microdissection 15 . To research scientists, the 
profiles that are derived from isolated cell types 
are a more intuitive approach to define biologi- 
cally relevant pathways. However, it should be 
noted that uses of array-based analysis of gene 
expression approved by the US Food and Drug 
Administration (FDA) will probably focus on 
reproducibility and robustness (as well as on 
predictive accuracy), rather than on biological 



"If computer software can 
be trained to recognize the 
expression profiles of each 
individual cell type within a 
mixed tissue sample, then it 
should be possible to subtract 
them from each other. 



interpretation or justification. The high-tech 
methods used to isolate specific cell types from 
clinical samples are unlikely to make their way 
into clinical trials unless tissues are procured in 
a highly centralized way. 

Procedural variation. Beyond the usual issues 
of sampling and accrual, gene-expression data 
will be subject to many additional sources of 
error. For example, the surgical removal and 
processing of tumour tissue can vary consid- 
erably from site to site. Laboratory QC proce- 
dures in tissue handling, RNA extraction and 
processing, and variations in protocols for 
data management and processing will need to 
be addressed in any clinical trial design. In 
particular, prolonged tissue ischaemia prior to 
processing of surgically resected tissue can sig- 
nificantly alter gene expression 16 . All tissue 
samples should be flash frozen within min- 
utes of surgery and stored at -80°C or below. 
Samples should also be kept in small, airtight 
containers and kept from drying out during 
frozen storage by placing fragments of ice in 
with the sample. 

Technical variability 

The standard laboratory protocol for generat- 
ing RNA profiles using Affymetrix microarrays 
involves a series of steps {fig. i). 
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Figure 1 | Sample processing and mfcroarray 
interpretation of Affymetrix GeneChips. Rash- 
frozen tissue (-50 mg) Is homogenized to isolate 
total RNA. Single-stranded (ss) cONA and then 
double-stranded (ds) cDNA is made from ~5 ug 
of total RNA Double-stranded cDNA contains a 
T7 RNA-potym erase promoter adjacent to the 
3' polyA tail of each transcript. It is transcribed 
in vitro to generate more than 400 biotinylated 
cRNA molecules for each ds cDNA molecule. The 
biotinylated cRNA is fragmented and hybridized 
to the microarrays. Each transcript is queried by 
one or more probe sets of 1 1 perfect-match and 
1 1 paired-mismatch oligonucieotides (the latter 
contain a centrally located point mutation as a 
form of hybridization specificity control)- Currency 
available Affymetrix microarrays have -54,000 
probe sets on each 1 .28 cm 2 glass microarray 
(-1.2 million 25-mer oligonucieotides on the 
HG-U1 33Ptus 2.0 array). The biotinylated 
cRNA fragments hybridize to the appropriate 
oligonucleotide features. A laser scanner determines 
the amount of bound biotinyfated cRNA indirectly 
through the streptavidin-conjugated phycoerythrin 
fluorescence at each feature within a probe set. The 
component probe pairs are interpreted and 
averaged to arrive at a single signaJ that reflects the 
relative abundance of the original mRNA. Probe 
sets are interpreted by any one of a number of 
probe-set algorithms, each providing a signal that 
reflects the relative hybridization intensity across the 
probe set. RT. reverse transcriptase. 



RNA isolation. RNA quality and quantity is 
crucial to the success and reproducibility of 
the expression profiles. RNA quantity and 
quality is generally checked by complemen- 
tary methods: UV 260/280 ratio > 1 .8, agarose 
gel electrophoresis or an Agilent Bioanalyzer 
to visualize clear 18S and 28S ribosomal RNA 
bands. Total RNA (5-10 ug) is input into the 
cDNA/cRNA reaction, with an expected cor- 
rected yield of biotinylated cRNA of between 
4- and 10-fold greater than the total RNA 
input (so 5 jig of total RNA must yield at least 
20 ug of biotinylated cRNA, or the sample is 
discarded). The biotinylated cRNA should be 
500-3,000 base pairs (bp) in size. After frag- 
mentation, the cRNA should be 50-200 bp. 
The Working Group recommends that sam- 
ples that do not meet these criteria should be 
discarded 

If RNA amount is limiting — as is the case, 
for example, with laser capture microscopy 
samples, flow-sorted cell samples or small tis- 
sue samples — a two- round amplification 
protocol can be used. For example, 200 ng of 
total RNA is processed for in vitro transcrip- 
tion (IVT), with the same goal of 4-10-fold 
amplification (>800 ng of cRNA output). 
One hundred nanograms of this cRNA is 
then reverse transcribed into cDNA using 
random primers, after which a second IVT is 
done. The second round IVT should result in 
a 400-fold amplification. 

Microarray controls. Hybridization controls 
include visualization of the image so that any 
abnormalities in hybridization patterns can be 
detected. ProbeProfiler from Corimbia Inc. is 
a program with extended capabilities for 
detecting defects in microarray manufacture. 
Affymetrix MAS 5.0 software adjusts the 
microarray-scanned image to a common tar- 
get intensity by using a scaling factor. In addi- 
tion, a general index of chip background and 
noise is represented by the percentage of 'pre- 
sent calls* (probe sets for which the hybridiza- 
tion to the perfect-match probes is significantly 
higher than mismatch hybridization). The 
Working Group believes that both the scaling 
factor and the percentage of present calls are 
important QC criteria. Considering MAS 5.0 
chip analyses, the scaling factors to normalize 
chips within a project should lie within two 
standard deviations of the mean, with present 
calls being greater than 25% (box 4). The per- 
centage of present calls is often lower when 
B or C arrays that contain higher propor- 
tions of more poorly characterized transcript 
units (expressed sequence tags or computer- 
predicted open reading frames) are used. The 
percentage of present calls across a set of sam- 
ples should be consistent, within a range of 



10%. Some software packages allow the iden- 
tification of statistical 'outlier' microarrays in 
a group of microarrays in a given project, 
which additionally enables the experimenter 
to flag and exclude specific microarrays that 
are not acceptable for an analysis. In addition 
to these criteria, acceptable hybridizations 
must have adequately intact input RNA as 
shown by 3' to 5' ratios of hybridization within 
probe sets. A typical control is the glyceralde- 
hyde 3-phosphate dehydrogenase (GAPDH) 
gene, which should have 3' to 5' ratios of less 
than 3 (box 4). 

The QC criteria provided above are based 
on MAS 5.0 probe-set algorithms and data 
analyses. The measures of present calls and 
scaling factors are useful and serve as initial 
summary measures of the performance of a 
particular microarray. However, more focused 
statistical methods, coupled with routine 
visual inspection of images, hold promise for 
the continuing improvement of data quality 
and screening abilities. 

Large-scale analyses of microarray data 
across laboratories have not yet been reported. 
However, the Working Group feels that adher- 
ence to the above QC criteria, using standard 
RNA isolation and processing methods, should 
yield data that are consistent between laborato- 
ries and intrinsically comparable. The same 
set of criteria can also be used as best practices 
for data generation in the design and conduct 
of clinical trials. 

Standard clinical laboratory practice is 
to develop programmes for submission of 
known samples to different laboratories 
and assessment of comparability of results. 
Such programmes are under development 
within larger collaborative efforts, such as 
the National Heart, Lung and Blood Institute 
(NHLBI) Programs in Genomic Applic- 
ations (see the HOP GENE Program for 
Genomic Applications in online links box) 
and the National Institute of General Medical 
Sciences (NIGMS) Glue Grant (see online 
links box). 

Data analysis and interpretation 

Signal generation versus statistical analyses. 
Two relatively distinct steps underlie all 
data analyses of AfTymetrix oligonucleotide 
microarrays: the development of a normal- 
ized 'signal* for each transcript on each 
microarray and the subsequent statistical 
analysis of differences in signals between dif- 
ferent arrays. The first step involves probe- 
set algorithms that use all, or part, of the 
component signals within a probe set and 
then derive a single signal that is representa- 
tive of the relative abundance of each mRNA 
queried in each array. The second step is the 
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Box 4 1 Quality Control metrics for Affymetrix microarrays 

RNA quality 

Optical density 260/280 of 1.8-2.1 1 Agilent Bio- Analyzer I Gel electrophoresis 
cDNA/cRNA efficiency 

>4-fold amplification from total RNA I 500-3,000 bp prior to fragmentation I 50-200 bp after 
fragmentation 

Chip hybridization 

Image inspection for defects 1 Scaling factors within two standard deviations within a project I 
MAS 5.0 present calls >25% for the a-series arrays, including the HG-Ul33Plus 2.0 array I 
Percentage present calls for the B- and c-series arrays are typically lower 1 375' GAPDH 
ratios <3 

Project normalization 

The detection of statistical outliers for chips, probe sets or individual probe pairs requires 
normalization and analysis across an entire project. This is afforded by the dCHIP and 
ProbeProfiler, and other software packages. Data-analysis packages that rely on intra-chip 
normalization and scaling typically do not enable detection of statistical outliers. 

Chip outliers 

Probe-set outliers I Probe-pair outliers I Range in present calls < 1 0% 



application of bioinformatic and statistical 
methods to identify interesting subsets of 
the assembled data of all arrays within a 
project. There is considerable debate about 
the best methods for both of these steps (see 
below for a discussion). Although the two 
steps are separable, it is clear that they have a 
marked influence on each other. It is in this 
realm that the bioinformatics of microarrays 
becomes avant-garde, and with the ground- 
breaking nature of research comes consider- 
able debate as to what is appropriate in any 
specific situation. 

Before discussing the different methods for 
probe-set analysis and data interpretation, it is 
important to point out that much of the 
debate in the field of bioinformatics about 
microarray interpretation revolves around sig- 
nal/noise ratios. A common assumption is that 
signal/noise ratios across a microarray are 
homogeneous, or at least similar in magnitude. 
This might be true for general background 
hybridization, but not for the performance of 
probe sets. In any particular microarray, there 
are probe sets that give very strong and clear 
hybridization patterns and those that perform 
poorly. Many of the best performing probe sets 
(those with a highly significant probe-set 
detection p value) reflect highly expressed tran- 
scripts with no closely related sequences that 
might cross-hybridize. Low-level transcripts, or 
transcripts that belong to gene families with 
highly homologous sequences derived from 
distinct genes, often have corresponding probe 
sets that do not perform as well and might 
have a significant, if not overwhelming, noise 
component. The signal from such probe sets 
is difficult to interpret, and data interpreta- 
tion can be limited to only the best perform- 
ing probe sets, although arguably the most 
interesting data comes from the genes that 
are expressed at low levels but that still show 
significant differences between samples. 

Determining adequate sensitivity of the 
signals and signal/noise responses relative to 
the absolute quantity of mRNA in clinical 
samples is crucial as microarrays become a 
component of clinical trials and diagnostic 
models. Affymetrix arrays provide a concen- 
tration of each mRNA queried relative to the 
genome-wide mRNA profile of the sample; it 
is assumed that the global mRNA content of a 
tissue as a whole does not change significandy, 
making relative mRNA quantification an 
accurate reflection of the response of the indi- 
vidual gene. This method differs from absolute 
quantification of specific mRNAs (such as 

SI NUCLEASE PROTECTION and REAL-TIME PCR), Or the 

isolated transcript ratio determined by co- 
hybridization of two samples to spotted 
cDNA or oligonucleotide arrays (box 2). 



Affymetrix arrays achieve considerable sensi- 
tivity through the inherent redundancy of 
the probe set; however, the Working Group 
acknowledged that some genes, such as some 
cytokines that are functional at very low 
expression levels, are probably below the limit 
of detection. 

The Working Group agreed that each 
project will have its own signal/noise opti- 
mum, and analysis methods that prove best 
for one project might prove unsuitable for 
another, ideally, a signal/noise ratio should 
be optimized for each project or trial using 
different probe-set algorithms and data- 
filtering methods, and some systematic 
efforts towards this end are beginning to 
appear in the literature 17 . 



"... adherence to the above 
QC criteria . . . should yield 
data that are consistent 
between laboratories and 
intrinsically comparable." 



After a signal is derived for each probe set, 
data is interpreted using statistical and visual- 
ization methods. All statistical methods run 
into two generic problems when faced with 
microarray data that are inter-related. The 
first is the curse of dimensionality — each 
gene is potentially related to every other gene, 
so all permutations of all available data must 
be considered, leading to an exponentially 
increasing number of possible associations in 
multidimensional space. The problem arises 



when associations (samples) become lost as 
the dimensionality increases — associations 
lose their local value and become generically 
global in statistical terms. Statistical models 
attempt to circumvent this curse by requesting 
larger and larger sample sizes, but fulfilling 
the requests becomes functionally impossi- 
ble for the experimentalist. There is no easy 
answer to these problems and they remain a 
challenge for future bioinformatics research 
that uses microarrays 1 *. 

Derivation of signal: probe-set algorithms and 
normalizations. One of the key advantages of 
the Affymetrix platform is the multiple mea- 
surements that are intrinsic to the probe set 
— most probes include 1 1 perfect- match and 
1 1 paired-mismatch 25-bp oligonucleo- 
tides per gene (FIG. l). Previous versions of 
GeneChip arrays used probe-set design meth- 
ods that led to considerable overlap between 
probes, so that hybridizations to each fea- 
ture/probe were not independent measure- 
ments; this led to considerable uncontrolled 
weighting of the contribution of any particu- 
lar region of sequence to the resulting signal. 
All recent chips use a much more refined 
probe-set design with less overlap and consid- 
erably better performance of the probe set. 
Improvements in array and probe-set designs 
have been accompanied by an evolution in 
primary analysis algorithms and the support- 
ing software provided by Affymetrix for data 
analysis and interpretation 19 . Affymetrix 
default algorithms are based on well-docu- 
mented statistical methods, namely the robust 

TUKETS B! -WEIGHT ESTIMATOR and WILCOXONTS SIGNED 

rank, to calculate the final probe-set signals 
and associated p values, respectively 19,20 . 
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Table 1 | Comparisons of probe-set analysis algorithms 



Algorithm 


Penalty for 
mismatch signal 


Normalization method 


Outlier detection 
and correction 


Sensitivity* 


Specificity* 


Affymetrix MAS 5,0 


High 


Individual chips 


Little 


Good 


Excellent 


dCHIP difference 
model 


'High 


Cross-project 


. Moderate 


Good 


Excellent 


dCHIP 


None 


Cross- project 


Moderate 


Excellent 


Good 


RMA 


None 


Cross-project 


Moderate 


Excellent 


Good 


ProbeProfiler 


Moderate 


Extensive 


Extensive 


Good 


Good 



'Sensitivity is based primarily on ROC (receiver operating characteristic) curves of spike-in mRNA data based on published reports (see rrttpy/Www.bioconductor.org) 2, - 2:J . 
^Specificity measurements are based both on expectations from mismatch weights and published observations in experimental data sets 1718 . 



Affymetrix has announced plans to con- 
tinue to improve the software components of 
the GeneChip platform. The upcoming release 
of the GeneChip Operating System (GCOS) 
is expected to incorporate refinements in the 
user interface, data management and analysis 
algorithms. Software tools aside, the most sig- 
nificant development on the analysis front is 
arguably the decision by Affymetrix to release 
previously proprietary chip-design details, 
such as probe sequences, chip-design parame- 
ters and file APIs (applications programming 
interfaces). Hie goal is to encourage scientists 
to develop innovative analysis tools that can 
potentially derive more biological value from 
GeneChip expression data. The challenge of 
providing a constantly growing and evolving 
body of information associated with arrays 
has been solved in part with a web-based 
tool. The company's NEIAFEX web site (see 
online links box) serves as the public portal 
for detailed information on chip design and 
has become a valuable resource for biological 
follow-up of GeneChip expression results. 
Third-party software developers can find 
additional support, including information on 
file APIs, through the Affymetrix Developers* 
Network (see online links box). 

Encouraged in part by the openness of the 
platform and spawned by an increase in 
knowledge and experience in array data 
analysis, scientists are developing a number of 
alternative algorithms for probe-set analysis, 
with the goal to derive the best signal that is 
representative of the mRNA level for each 
gene. As each signal is relative to other signals 
in the experiment (both between arrays for 
the same gene and relative to all other genes 
on the array), the process of normalization is 
intimately tied to derivation of signal. The 
more commonly used alternative probe-set 
analysis algorithms include dCHIP 20 , RMA 21 
and ProbeProfiler (TABLE 1). 

It is outside the scope of this article to dis- 
cuss the nature of the different probe-set 
interpretation and normalization algorithms 
in depth, and the reader is referred elsewhere 22 . 



The algorithms differ in a number of impor- 
tant ways (table l). First, the penalty weight 
that is assigned to the mismatch probe varies 
— MAS 5.0 assigns a relatively heavy penalty 
for cross-hybridization to the mismatch 
probe, RMA assigns no weight and dCHIP 
gives the choice of providing weight or no 
weight. Second, the ability to discard oudier 
signals varies from package to package, with 
dCHIP and ProbeProfiler having refined 
methods to detect outliers at each level of 
analysis (probe, probe set and microarray). 
These packages are able to replace deviant 
probes with expected data based on the 
remainder of the probe set, and/or flag abnor- 
mal probe sets and arrays for possible exclu- 
sion from further data analysis. Third, the 
method of normalization varies from within a 
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single array (MAS 5.0) to a project-based nor- 
malization (dCHIP, RMA and ProbeProfiler). 
Finally, MAS 5.0 provides a detection p value, 
in which a number is assigned to the confi- 
dence of the signal in question. This can be 
used to weight different probe-set signals in 
subsequent data interpretations. 

The output of all packages is a normalized 
signal (with or without an associated detec- 
tion p value) for each probe set on each array. 
These signals are then fed into data interpre- 
tation packages for statistical analyses and 
data visualizations. 

Different probe-set interpretation algo- 
rithms lead to different results. Members of 
the Working Group often encounter -50% 



concordance in general data output in their 
own work between comparisons of two dif- 
ferent algorithms. However, it is crucial to 
note that the large majority of discordant data 
lies in regions of relatively poor signal/noise 
ratios, and concordance deteriorates in exper- 
iments with high levels of confounding noise. 
In general, the programmes that put less 
weight on the mismatch show better sensitiv- 
ity (linearity) when signals are noisy (table i). 
However, this increased sensitivity can come 
at a cost of substantial contaminating noise 17 . 

The Working Group recommends using at 
least two probe-set algorithms for compari- 
son and prioritization of gene selection (for 
example, MAS 5.0 and the dCHIP difference 
model). 

Data interpretation. Most published microar- 
ray papers could be considered data-poor in 
terms of replicates and systematic statistical 
analyses, but data-rich both in terms of 
amount of high-quality data generated and 
significant research findings. Below, we point 
out the most appropriate current bioinfor- 
matics methods and additional methods that 
require further development so that data can 
be more fully mined for information content. 

A second general backdrop to the follow- 
ing discussion is that data visualization is 
one of the most powerful data interpretation 
tools, yet it rarely obeys statistical principles. 
The resolution of the human eye, coupled 
with the abstract computational power of the 
human brain, lies behind the popularity of 
hierarchical clustering and other non-statistical 
principles and visualization methods. How- 
ever, the eye and brain are poorly suited to 
spontaneously deriving statistical support 

There are two general types of experimen- 
tal design that lend themselves to different 
types of statistical and visual analysis: the 
cross-sectional study and the time-series study. 
The cross-sectional study typically has gene or 
pattern selection as the goal* the identification 
of one or more genes or patterns of expression 
that are diagnostic of the condition or state 



234 | MARCH 2004 | VOLUME 5 



www.nature.com/reviews/genetics 



PERSPECTIVES 



under study. This 'gene selection' might be for 
truly diagnostic purposes (for example, differ- 
ential diagnosis of leukaemia), or might be 
intended to identify relevant biochemical 
pathways. In both cases, the gene or pattern 
selection must be robust, usually implying a 
statistically principled approach, with subse- 
quent validation by predictive computer mod- 
elling (internal cross-validation) or, preferably, 
prospective validation on new data. 

Feature selection can be the main limiting 
factor in evaluation of the predictive perfor- 
mance of an analysis method when there are 
many predictors to select from. This was a 
'mantra* for some of the senior statisticians 
involved in predictive modelling with gene- 
expression array data for several years, but 
only now do the non-statistical users and 
developers of predictive models from non- 
statistical perspectives begin to appreciate 
these issues. Proper validation of any model or 
algorithm that relies on explicit feature selec- 
tion — such as choosing a subset of 70 genes 
from 20,000 — that underlies the resulting 



prediction simply must ensure that the analy- 
sis is tested by internal cross-validation that 
includes feature re-selection as part of the vali- 
dation 23,24 . The Working Group acknowl- 
edged that prospective validation of any 
findings using new data is the acid test of 
predictive performance. The focus on fea- 
ture or gene selection is vitally important 
when microarrays are used for differential 
diagnosis and has been best studied in cancer 
biopsy/tissue studies. 

An increasing proportion of microarray 
studies focus on delineation of biochemical 
pathways that are modulated in response 
to some stimulus. In practice, these studies 
typically use feature selection to identify 
potential pathways that are involved in the 
response of the cells or tissues. Validation 
is then done on the identified biochemical 
pathways of interest, using mRNA (real-time 
PCR) or protein studies, often proving cause 
and effect in experimental models. 

The Working Group notes that robust 
feature selection for the purpose of diagnosis 



and molecular markers in clinical trials 
requires robust statistical methods, as out- 
lined below, and the burden of proof lies 
with statistical validation. For microarray 
experiments designed to delineate biochem- 
ical pathways, feature selection is used for 
generating a hypothesis and the burden of 
proof of the hypothesis lies with laboratory- 
based research, often at the protein level. 

For feature selection, the Working Group 
recommends that users experiment with 
various statistical methods (such as standard 
parametric tests, nonparametric methods, 
false discovery rate and related methods 25 , 
global or local shrinkage of raw signal inten- 
sities and Stanford's 'nearest shrunken cen- 
troids' 26 ). Developments related to survival 
data analysis are receiving increased attention 
because clinical trials will raise the need to 
move that way. As a corollary, analysis meth- 
ods that focus on signatures of groups of 
genes (such as averages of clusters, Duke's 
metagenes 27 " 29 and Stanford's eigengenes 30 ) 
seem worth stressing in predictive contexts. 



Glossary 

A-, B- AND C-SERIES ARRAYS 

A series of human, rat and mouse Affymetrix arrays 
released in 2003, in which the A array contained the 
best- characterized genes, and B and C arrays contained 
less well-defined expressed sequence tags. In 2004, all 
probe sets have been condensed so that there is only 
one microarray per species that covers the entire 
genome. 

CROSS-SECTIONAL DESIGN 

The use of different subjects in an experimental and 
control group or groups. The statistical analysis 
compares the median and variation within each 
group relative to the other groups. 

FEATURE 

Typically one element (spot) on a microarray. 
In spotted cDNA or oligonucleotide arrays, 
features correspond to genes or transcripts; in 
Affymetrix arrays, there are typically 
22 elements per probe set and often multiple 
probe sets per gene, so a feature might refer 
to a single oligonucleotide, a probe pair or 
a probe set, or a gene with multiple probe 
sets. In bioinformatics it is most often 
synonymous with a gene. 

FLUORESCENCE-ACTIVATED CELL SORTING 
(FACS). A method whereby dissociated and individual 
living cells are sorted, in a liquid stream, according to the 
intensity of fluorescence that they emit as they pass 
through a laser beam. 

FLUOROPHORE 

A small molecule, or a part of a larger molecule, 
that can be excited by light to emit fluorescence. 

ISCHAEMIA 

The loss of blood supply, and hence oxygenation, to a 
tissue or cells. 



LASER CAPTURE MICRODISSECTION 
A technique in which individual cells, or regions of tissue, 
are excised from a histological preparation, using specially 
equipped microscopes, and isolated for further study. 

LONGITUDINAL DESIGN 

The use of multiple samples from the same subject With 
this design, each subject serves as their own control, 
eliminating confounding inter- individual variations at 
baseline; paired r- tests are used to interpret the data. 

NEGATIVE CELL ISOLATION 

The use of antibodies or other reagents to remove all 
unwanted cells from a mixed population of cells. In this 
method, the desired cells are not exposed to bound 
antibodies, thereby avoiding potential activation or other 
molecular alteration in the desired cells. 

PENALTY WEIGHT 

In Affymetrix arrays, hybridization to the 'mismatch' 
probe of a probe pair might or might not be considered 
as a form of measurement of noise or background, and 
can be factored into the signal seen with the paired 
'perfect match* as a penalty weight. 

PHOTOLITHOGRAPHY 

The process of using light to either etch or activate 
regions of a surface (substrate). This method is used in 
microelectronics to create integrated circuits and 
processors. 

REAL-TIME PCR 

The quantification of the amount of VCR product 
during each cycle of a PCR reaction. The product 
concentration, as a function of cycle number, provides 
a good estimation of the relative quantity of the 
mRNA being tested. 

RESECTION 

Surgical removal of tissue, most commonly used for 
removing tumorous masses from surrounding tissue. 



SI NUCLEASE PROTECTION 
An experimental method for determining mRNA 
transcript concentration in a tissue or cell RNA sample. It 
involves using labelled DNA probes that bind the RNA, 
with overhanging non-hybridized tails of the probe then 
being digested by the SI nuclease. This creates a smaller 
labelled DNA probe that is indicative of the abundance of 
the mRNA being tested. 

SURVIVAL DATA ANALYSIS 

A battery of statistical methods applied to data when 
mortality is often the only, or best, measured outcome. 

TIME-SERIES STUDY 

The use of a series of samples taken at defined time 
points after a defined stimulus. In mice and rats, the 
samples at different time points are usually from 
different animals. In humans, time-series studies are 
necessarily longitudinal to avoid additional confounding 
noise. 

TUKEY'S BI-WE1GHT ESTIMATOR 
Many statistical tests require underlying definitions 
that are assumed to be valid (for example, tumour 
versus non-tumour), and require data that show a 
normal distribution. Microarray data, and the 
clinical information underlying the definition of 
samples, is often less exact, with genes or samples 
often performing as statistical outliers. Tukey's bi- 
weight estimator is one of the M- class of statistical 
models that is less sensitive to outliers and performs 
more gracefully when underlying assumptions are 
inexact. 

WILCOXON'S SIGNED RANK 
A statistical test that investigates the population 
median of paired differences. It is well suited for 
microarray work as it treats each gene as an 
independent variable and does not require normal 
distributions of the data. 
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Figure 2 1 Dense time-series data with adequate replicates can provide robust visual 
interpretation of data. Dynamic, single-gene queries can provide visually compelling results 
and avoid many issues that complicate statistical analyses of cross-sectional microarray studies. 
Dynamic queries of the gotl transcript probe set from a time-series study is shown. The x-axis represents 
time (in hours) after rats were given a bolus of 3-methylprednisolone. They-axis represents relative 
expression level. Individual rats were studied at each time point (green data points), with both liver (panel a) 
and muscle (panel b) tissue taken from the same animals. Averages of the replicates for each data point are 
shown (in magenta); the graph line is drawn between the averages. At baseline (time 0), the gotl transcript 
has a normalized signal of about 2,400 in liver and 1 ,200 in muscle. The gene is clearly responsive to 
3-methylprednisolone in liver (panel a), where expression rapidly increases within the first hour, plateaus 
between 1 -7 hr, then quickly falls back to baseline by 1 2 hr. The replicates appear relatively consistent. 
By contrast, the same gene in muscle does not seem to respond to the drug; the variability in replicates 
is larger than any temporally-relevant change. Data from http://micr0array.cnmcresearch.0r9/ 
singlegenemain.asp and REFS 6,32. 



Whatever the specific statistical model that is 
applied for prediction, using aggregate gene 
expression has important consequences: 
measures of aggregation of expression over a 
group of genes with related profiles can 
reduce dimension (thereby mitigating the 



curse of dimensionality). This can reduce 
multiplicities and, to some degree, ease the 
problems of gene selection, multiple testing 
and co-linearity, while improving signal 
estimation by averaging correlated noise 
components. 



Data visualizations, time series and candidate 
genes. The above discussions of biostatistics 
all assume that the analysis is targeted towards 
a cross-sectional study, in which the primary 
goal is diagnostic gene discovery (gene or fea- 
ture selection). In other words, a series of 
microar rays with a very large number of tran- 
scripts defines the very small minority of 
genes that are correlated and therefore predic- 
tive of the biological variable of interest. 
There are alternatives to this standard experi- 
mental design that use entirely different types 
of analysis, and the statistical issues are also 
quite different, as explained below. 

The time-series study, if done with enough 
time points, can provide an effective antidote 
to the curse of dimensionality — the action of 
any gene during a time-series study should 
make biological sense, such that each signal is 
relatively easily discernible from noise. Visual 
query of a large time-series data set for single 
gene responses to the controlled variable either 
might meet expectations and is therefore valid, 
or might not meet expectations and is dis- 
carded as uninteresting. As an example, we 
show a time-series study in which rats are given 
a bolus of methylprednisolone, after which 
their liver and muscle are studied as a function 
of time (FIG. 2). In this case, the same gene (gotl) 
is queried using a web-based dynamic visual- 
ization tool, first in liver (fig. 2a) and then in 
muscle (FIG. 2b). The data in the top panel are 
visually compelling; gotl in liver responds 
quickly and strongly to a bolus of 3-methyl- 
prednisolone, with relatively consistent repli- 
cates (each data point comes from a different 
animal) and a time course that is visually assur- 
ing so that complex statistical tests of the tran- 
scriptional response as a function of time are 
not needed. On the other hand, the same gene 
in muscle does not seem to respond to the 
drug 6,31 (fig. 2b). Through such gene queries, 
the variability in replicates and the appropri- 
ateness of the action of the gene as a function 
of time can quickly be assessed. Another 
advantage of time-series data is that such pro- 
files act as biomarkers that are amenable to 
analysis and interpretation using pharmacody- 
namic models that predict the underlying 
mechanisms of control of gene expression 32 . 

The Working Group agreed that data-rich, 
time-series experimental designs provide 
some latitude in reporting significant findings 
and that the query of individual genes within 
large data sets can circumvent complex issues 
of multidimensionality of data. 

Future areas of development 

The data-rich and highly dimensional nature of 
microarray data serves as a model for future 
dissection and understanding of biological 
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systems in general, including proteomics and 
integration of mRNA profiling and proteomics. 
The Working Group discussed data analysis 
needs within the microarray community and 
agreed that, along with the incorporation of 
QC, SOPs and optimized or customized sig- 
nal/noise analyses in initial project signal gener- 
ation, the back-end statistics needs to reach a 
commonly accepted method of dealing with the 
curse of dimensionality before microarrays can 
be reliably used in clinical trials. Statisticians 
need to focus more on representation of pre- 
diction results in terms of probabilities and 
associated measures of uncertainty, and reach a 
consensus on what is acceptable. In the mean- 
time, it is likely that specific marker or diagnos- 
tic genes will be extracted from pilot profiling 
studies, and then only this small subset of genes 
will be used as a clinical trial endpoint. This data 
limitation approach removes much of the curse 
of dimensionality, but is liable to ignore the 
large majority of data, thereby decreasing the 
potential power of the study and bringing into 
question the use of microarrays in clinical trials. 

A move towards the standardization of 
reporting of prediction accuracy would be 
desirable when assessing predictive accuracy 
through within-sample cross-validation. The 
Working Group suggests that one or more val- 
idation techniques be used when reporting 
predictive genes: leave-one-out and 10% 
cross-validation summaries, or true validation 
data sets. Communicating uncertainty about 
predictive performance is also key and will 
help evaluate results based on varying sample 
sizes. The Working Group suggests that until 
this information is routinely presented in pub- 
lished papers, it will be difficult to reach an 
acceptable consensus for use in clinical trials. 

Conclusions 

There are four key areas of optimization and 
standardization that are largely independent: 
study design, technical variability (QC/SOP of 
data generation), analysis method variation 
(signal/noise optimization using probeset 
algorithms and normalizations) and back-end 
statistical analyses. Statistics of clinical trial 
design is crucial: gene-expression data does 
not mitigate the need for sound and relevant 
design and analysis, nor does it challenge what 
we know about design. The field is quickly 
maturing from the small- chip- number hit- 
and-run type projects to those with a more 
robust study design. However, study design 
depends ultimately on appropriate powering 
of a study, which is greatly affected by both 
the chip -analysis algorithms that are used and 
the biostatistical data analysis. 

Development of back-end statistical meth- 
ods for data representation/summary and for 



high-level analysis remains an active area of 
research for both academic and commercial 
users, and is likely to remain so in the near 
future. We are some way from defining stan- 
dards of summary signal intensities alone 
and even further from considerations of 
standardization of analytical methods for 
inference and prediction in clinical contexts. 
In regulated clinical studies, such standards 
will be enforced partly by the US FDA as sub- 
missions of medical test/device protocols 
emerge and increase in number. Even then, 
however, many approaches to data analysis 
and modelling will be used and developed, 
which is, of course, to be supported. It is very 
difficult to influence the research commu- 
nity, especially when the variety of problems 
that are encountered promotes the need for 
refined and new approaches. 
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FROM DNATO RNA 

Transcription and translation are the means by which cells read out, or express, 
the genetic instructions in their genes. Because many identical RNA copies can 
be made from the same gene, and each RNA molecule can direct the synthesis 
of many identical protein molecules, cells can synthesize a large amount of 
protein rapidly when necessary. But each gene can also be transcribed and 
translated with a different efficiency, allowing the cell to make vast quantities of 
some proteins and tiny quantities of others (Figure 6-3). Moreover, as we see in 
the next chapter, a cell can change (or regulate) the expression of each of its 
genes according to the needs of the moment— most obviously by controlling 
the production of its RNA. 



Figure 6-3 Genes can be expressed 
with different efficiencies. Gene A is 
transcribed and translated much more 
efficiently than gene B.This allows the 
amount of protein A in the cell to be 
much greater than that of protein B. 



Portions of DNA Sequence Are Transcribed into RNA 

The first step a cell takes in reading out a needed part of its genetic instructions 
is to copy a particular portion of its DNA nucleotide sequence — a gene — into an 
RNA nucleotide sequence. The information in RNA, although copied into another 
chemical form, is still written in essentially the same language as it is in DNA — 
the language of a nucleotide sequence. Hence the name transcription. 

like DNA, RNA is a linear polymer made of four different types of nucleotide 
subunits linked together by phosphodiester bonds [Figure 6-4). It differs from 
DNA chemically in two* respects: (1) the nucleotides in RNA are 
ribonucleotides— that is, they contain the sugar ribose (hence the name ribonu- 
cleic acid) rather than deojcyribose; (2) although, like DNA, RNA contains the 
bases adenine (A), guanine (G), and cytosine (C), it contains the base uracil (U) 
instead of the thymine (T) in DNA. Since U, like T, can base-pair by hydrogen- 
bonding with A (Figure 6-5), the complementary base-pairing properties 
described for DNA in Chapters 4 and 5 apply also to RNA (in RNA, G pairs with 
C, and A pairs with U). It is not uncommon, however, to find other types of base 
pairs in RNA: for example, G pairing with U occasionally. 

Despite these small chemical differences, DNA and RNA differ quite dra- 
matically in overall structure. Whereas DNA always occurs in cells as a double- 
stranded helix, RNA is single-stranded. RNA chains therefore fold up into a 
variety of shapes, just as a polypeptide chain folds up to form the final shape of 
a protein (Figure 6-6). As we see later in this chapter, the ability to fold into com- 
plex three-dimensional shapes allows some RNA molecules to have structural 
and catalytic functions. 



Transcription Produces RNA Complementary to 
One Strand of DNA 

All of the RNA in a cell is made by DNA transcription, a process that has cer- 
tain similarities to the process of DNA replication discussed in Chapter 5. 
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Figure 6-69 Protein aggregates that cause human disease. (A) Schematic illustration of the type of 
conformational change in a protein that produces material for a cross-beta filament. (B) Diagram illustrating 
the self-infectious nature of the protein aggregation that is central to prion diseases. PrP is highly unusual 
because the misfolded version of the protein, called PrP*. induces the normal PrP protein it contacts* to 
change its conformation, as shown. Most of the human diseases caused by protein aggregation are caused by 
the overproduction of a variant protein that is especially prone to aggregation, but because this structure Is 
not Infectious in this way, it cannot spread from one animal to another. (Q Drawing of a cross-beta filament, 
a common type of protease-resistant protein aggregate found in a variety of human neurological diseases. 
Because the hydrogen-bond interactions In a p sheet form between polypeptide backbone atoms (see Figure 
3-9), a number of different abnormally folded proteins can produce this structure. (D) One of several 
possible models for the conversion of PrP to PrP*, showing the likely change of two ex-helices into four 
p-strands. Although the structure of the normal protein has been determined accurately, -die structure of the 
infectious form is not yet known with certainty because the aggregation has prevented the use of standard 
structural techniques. (C, courtesy of Louise Serpell, adapted from M. Sunde et al.,J. MoT. BioL 273:729-739, 
1 997; D, adapted from S.B. Prusiner, Trends Bfochem. Sd. 21:482-487, 1996.) 

animals and humans. It can be dangerous to eat the tissues of animals that con- 
tain PrP*, as witnessed most recently by the spread of BSE (commonly referred 
to as the "mad cow disease") from cattle to humans in Great Britain, 

Fortunately, in the absence of PrP*, PrP is extraordinarily difficult to convert 
to its abnormal form. Although very few proteins have the potential to misfold 
into an infectious conformation, a similar transformation has been discovered 
to be the cause of an otherwise mysterious "protein-only inheritance" observed 
in yeast cells. 

There Are Many Steps From DNA to Protein 

We have seen so far in this chapter that many different types of chemical reac- 
tions are required to produce a properly folded protein from the information 
contained in a gene (Figure 6-90). The final level of a properly folded protein in 
a cell therefore depends upon the efficiency with which each of the many steps 
is performed. 

We discuss in Chapter 7 that cells have the ability to change the levels of 
their proteins according to their needs. In principle, any or all of the steps in Fig- 
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Figure 6-90 The production of a 
protein by a eucaryotic cell. The final 
level of each protein in a eucaryotic cell 
depends upon the efficiency of each step 
depicted. 



ure S-90) could be regulated by the cell for each individual protein. However, as 
we shall see in Chapter 7, the initiation of transcription is the most common 
point for a cell to regulate the expression of each of its genes. This makes sense, 
inasmuch as the most efficient way to keep a gene from being expressed is to 
block the very first step— the transcription of its DNA sequence into an RNA 
molecule. 



Summary 

The translation of the nucleotide sequence of an mRNA molecule into protein takes 
place in the cytoplasm on a large ribonucleoprotein assembly called a ribosome. The 
amino acids used for protein synthesis are first attached to a family of tRNA 
molecules, each of which recognizes, by complementary base-pair interactions, par- 
ticularsets of three nucleotides in the mRNA (codons). The sequence of nucleotides in 
the mRNA is then read from one end to the other in sets of three according to the 
genetic code. 

To initiate translation, a small ribosomal subunit binds to the mRNA molecule 
at a start codon (AUG) that is recognized by a unique initiator tRNA molecule. A 
large ribosomal subunit binds to complete the ribosome and begin the elongation 
phase of protein synthesis. During this phase, aminoacyl iRNAs—each bearing a 
specific amino acid bind sequentially to the appropriate codon in mRNA by forming 
complementary base pairs with the tRNA anticodon. Each amino acid is added to the 
C-terminal end of the growing polypeptide by means of a cycle of three sequential 
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Figure 7-5 Six steps at which 
eucaryotic gene expression can be 
controlled. Controls that operate at 
steps ! through 5 are discussed ip this 
chapter. Step 6, the regulation of protein 
activity, includes reversible activation or 
tnactlvation by protein phosphorylation' 
(discussed In Chapter 3) as well as 
irreversible inactivation by proteolytic 
degradation (discussed in Chapter 6). 



Gene Expression Can Be Regulated at Many of the Steps 
in the Pathway from DNA to RNA to Protein 

If differences among the various cell types of an organism depend on the partic- 
ular genes that the cells express, at what level is the control of gene expression 
exercised? As we saw in the last chapter, there are many steps in the pathway 
leading from DNA to protein, and all of them can in principle be regulated. Thus 
a cell can control the proteins it makes by (1) controlling when and how often a 
given gene is transcribed (transcriptional control), (2) controlling how the RNA 
transcript is spliced or otherwise processed (RNA processing control), (3) 
selecting which completed mRNAs in the cell nucleus are exported to the cytosol 
and determining where in the cytosol they are localized (RNA transport and 
localization control), (4) selecting which mRNAs in the cytoplasm are translated 
by ribosomes (translational control), (5) selectively destabilizing certain mRNA 
molecules in the cytoplasm (mRNA degradation control), or (6) selectively acti- 
vating, inactivating, degrading, or compartmentalizing specific protein 
molecules after they have been made (protein activity control) (Figure 7-5). 

For most genes transcriptional controls are paramount. This makes sense 
because, of all the possible control points illustrated in Figure 7-5, only tran- 
scriptional control ensures that the cell will not synthesize superfluous interme- 
diates. In the following sections we discuss the DNA and protein components 
that perform this function by regulating the initiation of gene transcription. We 
shall return at the end of the chapter to the additional ways of regulating gene 
expression. 

Summary 

i «* genome of a cell contains in its DNA sequence the information to make many 

1 it tfwMsawfe of different protein and RNA molecules. A cell typically expresses only a 
H fraction of its genes, and the different types of cells in multicellular organisms arise 
St because different sets of genes are expressed. Moreover, cells can change the pattern 
1| of genes they express in response to changes in their environment, such as signals 
fr° m other cells. Although all of the steps involved in expressing a gene can in prin- 
|£| ciple be regulated, for most genes the initiation of RNA transcription is the most 
ISI Important point of control 



DNA-BINDING MOTIFS IN GENE REGULATORY 
PROTEINS 

How does a cell determine which of its thousands of genes to transcribe? As 
mentioned briefly in Chapters 4 and 6, the transcription of each gene is con- 
trolled by a regulatory region of DNA relatively near the site where transcription 
begins. Some regulatory regions are simple and act as switches that are thrown 
;by a single signal. Many others are complex and act as tiny microprocessors, 
responding to a variety of signals that they interpret and integrate to switch the 
^neighboring gene on or off. Whether complex or simple, these switching devices 
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occur in the germ line, the cell lineage that gives rise to sperm or eggs. Most of 
the DNA in vertebrate germ cells is inactive and highly methylated. Over long 
periods of evolutionary time, the methylated CG sequences in these inactive . 
regions have presumably been lost through spontaneous deamination events 
that were not properly repaired. However promoters of genes that remain active 
in the germ cell lineages (including most housekeeping genes) are kept 
unmethyiated, and therefore spontaneous dearninations of Cs that occur with- 
in them can be accurately repaired. Such regions are preserved in modern day 
vertebrate cells as CG islands. In addition, any mutation of a CG sequence in the 
genome that destroyed the function or regulation of a gene in the adult would be 
selected against, and some CG islands are simply the result of a higher than nor- 
mal density of critical CG sequences. 

The mammalian genome contains an estimated 20,000 CG islands. Most of 
the islands mark the 5' ends of transcription units and thus, presumably, of 
genes. The presence of CG islands often provides a convenient way of identify- 
ing genes in the DNA sequences of vertebrate genomes. 



Summary 

The many types of cells in animals and plants are created largely through mecha- 
nisms that cause different genes to be transcribed in different cells. Since many 
specialized animal cells can maintain their unique character through many cell 
division cycles and even when grown in culture, the gene regulatory mechanisms 
involved in creating them must be stable once established and heritable when the 
ceU divides. These features endow the cell with a memory of its developmental history. 
Bacteria and yeasts provide unusually accessible model systems in which to study 
gene regulatory mechanisms. One such mechanism involves a competitive interac- 
tion between two gene regulatory proteins, each of which inhibits the synthesis of the 
other; this can create a flip-flop switch that switches a cell between two alternative 
patterns of gene expression. Direct or indirect positive feedback loops, which enable 
gene regulatory proteins to perpetuate their own synthesis, provide a general mech- 
anism for cell memory. Negative feedback loops with programmed delays form the 
basis far cellular clocks. 

In eucaryotes the transcription of a gene is generally controlled by combinations 
of gene regulatory proteins. It is thought that each type of cell in a higher eucaryotic 
organism contains a specific combination of gene regulatory proteins that ensures 
the expression of only those genes appropriate to that type of cell A given gene regu- 
latory protein may be active in a variety of circumstances and typically is involved 
in the regulation of many genes. 

In addition to diffusible gene regulatory proteins, inherited states of chromatin 
condensation are also used by eucaryotic cells to regulate gene expression. An espe- 
cially dramatic case is die inactivation of an entire X chromosome in female mam- 
mals. In vertebrates DNA methylation also functions in gene regulation, being used 
mainly as a device to reinforce decisions about gene expression that are made ini- 
tially by other mechanisms. DNA methylation also underlies the phenomenon of 
genomic imprinting in mammals, in which the expression of a gene depends on 
whether it was inherited from the mother or the father. 
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Figure 7-86 A mechanism to explain 
both the marked overall deficiency 
of CG sequences and their clustering 
into CG islands In .vertebrate 
genomes. A black tine marks the location 
of a CG dinudeotide In the DNA 
sequence, while a red "lollipop" indicates 
the presence of a methyl group on the 
CG dinudeotide. CG sequences that lie In 
regulatory sequences of genes that are 
transcribed in germ cells are unmethyiated 
and therefore tend to be retained In 
evolution. Methylated CG sequences, on 
the other hand, tend to be lost through 
deamination of 5-methyl C toT, unless the 
CG sequence is critical for survival. 



POSTTRANSCRIPTIONAL CONTROLS 

In principle, every step required for the process of gene expression could be 
controlled. Indeed, one can find examples of each type of regulation, although 
any one gene is likely to use only a few of them. Controls on the initiation of 
gene transcription are the predominant form of regulation for most genes. But 
other controls can act later in the pathway from DNA to protein to modulate 
the amount of gene product that is made. Although these posttranscriptional 
controls, which operate after RNA polymerase has bound to the gene's promoter 
and begun RNA synthesis, are less common than transcriptional control, for 
many genes they are crucial. 
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AMPLIFICATION OF CELLULAR ONCOGENES IN CANCER CELLS 
K. ALITALO 

FROM THE DEPARTMENT OF VIROLOGY, UNIVERSITY OF HELSINKI, HELSINKI, FINLAND 

ABSTRACT 

Regulatory or structural alterations of cellular oncogenes have been implicated in the causation of varum 
cancers. Oncogene alteration by point mutations can result in a protein product with strongly enhanced oi\. 
cogenic potential. Aberrant expression of cellular oncogenes may be due to tumour-specific chromosomal 
translocations that dysregulate the normal functions of a proto-oncogene. Amplification of cellular onco- 
genes can also augment their expression by increasing the amount of DNA template available for the pro- 
duction of mRNA. It appears that amplification of certain oncogenes is a common correlate of tlu- progres- 
sion of some tumours and also occurs as a rare sporadic event affecting various oncogenes in different typo 
of cancer. Amplified copies of oncogenes may or may not be associated with chromosomal abnormalities 
signifying DNA amplification: double minute chromosomes and homogeneously staining chromosomut 
regions. Amplified oncogenes, whether sporadic or tumour type-specific, are expressed at elevated levels, 
in some cases in cells where their diploid forms are normally silent. Increased dosage of an amplified 
oncogene may contribute to the multistep progression of at least some cancers. 

KEY WORDS: CELLULAR ONCOGENES, GENE AMPLIFICATION, MULTISTEP CARCINOGENESIS, CLONAL SELECTION 
KARYOTYPIC ABNORMALITIES, DOUBLE MINUTE CHROMOSOMES, HOMOGENEOUSLY STAINING CHROMOSOMAL 
REGIONS 



DNA SEQUENCE AMPLIFICATION AND 
-CYTOGENETIC ABNORMALITIES 
IN TUMOURS 

Since its discovery in drug-resistant eukaryotic 
cells, somatic amplification of specific genes 
has been implicated in an increasing variety 
of adaptive responses of cells to environmental 
stresses (70, 79). Cytogenetic abnormalities, 
double minute chromosomes (dmin:s) asso- 
ciated with DNA amplification had already 
been discovered in tumour cells before the 
discovery of dmin:s and homogeneously stain- 
ing chromosomal regions (HSR:s) in cells se- 
lected for drug-resistance (12, 24, 49, 50, 56). 
In metaphase spreads, dmin:s appear as small, 
spherical, usually paired chromosome — like 
structures that lack a centromere (Fig. 2). HSR:s 
stain with intermediate intensity throughout 
their length rather than with the normal pat- 
tern of alternating dark and light bands in 
both trypsin-Giemsa (Fig. 3A) and quinacrine 
dihydrochloride-stained preparations. Both 
kinds of abnormalities are occasionally found 
in metaphases of freshly isolated cancer cells 
but not of normal cells (8). 

Dmin:s and HSR:s are apparently rare in 
tumour cells in vivo, although exact data are 



difficult to obtain since the abnormalities arc 
easily missed in routine cytogenetic analysis 
(8, 42). Dmin:s and HSR:s have been de- 
scribed in most types of in vitro-culturcd malig- 
nant tumour cells, with a notable frequency 
in neuroblastoma cell lines (11). Initial growth 
in cell culture apparently selects for tumour 
cells that contain either dmims or HSR:s. 
Moreover, in culture dmin:s are frequently 
lost, concomitant with the appearance of 
clonal populations of cells that have devel- 
oped an HSR, suggesting that the two cyto- 
genetic abnormaltities are alternative forms of 
gene amplification and that HSR:s may confer 
a selective advantage on cells over dmin:s (II, 
70). It has been assumed that HSR:s can break 
down to firm dmin:s and that dmin:s can in- 
tegrate into chromosomes to generate HSR:s 
(11, 23). Amplified genes may also occupy ab- 
normally banding regions, ABR:s (51, 59). Ex- 
perimental work on drug-resistant cells has 
shown that in the absence of a selection pres- 
sure (drug), dmin:s and the amplified genes 
that they contain are lost, whereas amplify" 
DNA in the form of HSR:s is retained in [he 
cells (71). This is explained by the fact that 
dminrs are segregated unevenly in mitosis ana 
frequently get lost from the nucleus due to 
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TABLE 1 
Currently known oncogenes. 
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ONCOGENES FOUND IN RETROVIRUSES 






Retrovirus 
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Oncogene 




Gene product 






Cellular location 


Function of protein 


Class 


RSV 

Y73V 

GR-FeSV 

Ab-MuLV 

FuSV 

ST-and 

GA-FeSV 

UR2V 


src 
yes 
fgr 
abl 
fpsffes 

fes/fps 

ros 


Plasma membrane 
Plasma membrane 
Plasma membrane 
Plasma membrane 
Cytoplasm 
(plasma membrane?) 
Cytoplasm 
(cytoskeleton?) 


Tyrosine-specific 
protein kinases 
{fgr contains sequences 
homologous to actin) 


Class la {Cytoplasmic 
tyrosine protein kinases) 


AEV 

SM-FcSV 

MH-2V 

3911-MSV 

Mo-MSV 


crfc-B 

fn\s 
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Plasma membrane 
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Plasma membrane 
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Cytoplasm 
Cytoplasm 
Cytoplasm 
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SSV 


sis 


Secreted 


PDGF-like growth factor 


Class 2 (Growth factors) 


Ha-MSV 
Ki-MSV 


Ha-ras 
Ki-ras 


Plasma membrane 
Plasma membrane 


GTP-blnding proteins 


Class 3 {Cytoplasmic 
GTP:ases) 


FBJ-MuSV 

OK-I0V 

AMV 


fos 

myc 

myb 


Nucleus 
Nucleus 
Nucleus 
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Nuclear matrix protein 
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Class 4 (Nuclear phospho- 
proteins) 


SKV 770 
REV 
AEV 
E26V 


ski 
rel 

erb-A 
ets 


Nucleus? 
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? 
? 


? 
? 
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Unclassified 


ONCOGENES FOUND IN TUMOUR CELLS BUT NOT IN RETROVIRUSES 
Tumour cell 


Neuroblastoma 
Neuroblastoma 
Small-cell lung cancer 
Neuro-/Glioblastomas 


N-ras 
N-myc 
h-myc 
neu 


Plasma membrane 

? 

? 

Plasma membrane 


GTP-binding 

7 
? 

Growth factor receptor 


Class 3 
Class 4 
Class 4 
Class lb 



their lack of centromeres, (49). HSR chromo- 
somes carry centromeres and are therefore 
divided equally at mitosis. If dmin:s and 
HSR:s contain amplified genes that encode 
growth-stimulating protein products, it would 
follow that the more stable chromosomal 
f orm, the HSR, confers a greater selective 
growth advantage for cells. Although dmin:s 
and HSR:s have been described predominant- 
ly in tumour cells selected for resistance to cy- 
totoxic drugs, it is also clear that dmin:s and 
"$R:s may be present in cancer cells before 
< any form of therapy (8). It was in this setting 
: that we and others first chose to explore the 
I Possible amplification of cellular oncogenes. 
(By definition, cellular oncogenes are normally 



innocent genetic loci which can be activated 
to transforming genes in various ways). 1 



DMIN:S AND HSR:S CONTAIN AMPLIFIED 
ONCOGENES 

Table 2 summarizes the somatic amplifica- 
tions of cellular oncogenes so far reported in 

1 It is not the purpose of this review to deal with all forms 
of DNA damage that have been found to activate cel- 
lular oncogenes. For the purpose of integrating the re- 
view into a coherent picture, however, the reader is 
given a list of known cellular oncogenes in Table 1 and 
the schematic Figure 1 illustrating the various ways in 
which the oncogenic potential of different proto-onco- 
genes can be activated. 
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Fie 1 Activation of cellular oncogenes. The haploid complement of a proto-oncogene is schematically depicted in A 
composed of three exons Iblack boxes) in a segment of DNA. The different activated forms are schematically outlined 
in B-a The abbreviation co*c stands for cellular oncogene v-o*c viral oncogene. DR t< W^.^ 
strong promoter/enhancer functions are striated, and an actively transenbed gene is marked ^J^"^-^^ 
transforminR retroviruses have the capacity to transduce cellular oncogenes (c-okc) into their genome, modify them 
^iSflES^vated oncogenes (v-oL) into the genome of host animal cells as a part o their proviral forms. 
The activUy of the v-onc gene is greatly enhanced due to the associated promoter of the proviral long terminal repeat. 
Sth m£S d£^ B the oncW and structural mutations within its sequence may ^^.^^^ 
C Slow transforming retroviruses without oncogenes replicate and reinsert their proviral cop es nto the host cell DNA 
d^tb^P^ ^om infection to tu ovigenesis. Tumor initiation through hyperplastic growth may begin, 
when g the Vrovirus integrates sufficiently close to a proto-oncogene to activate it through promoter or enchancer func- 
Ins It should be noted, however, that mutations have also been found in the oncogenes thus activated and that muta- 
U^i oaSf to other oncogenes has been described in the resulting tumors. D. In some mouse plasmacytomas a 
retrovirus^Uke DNA elemenf (directing the synthesis of the so-called intraasternal A- ype particles IAPs) has been 
ound £ a sociation with a transcriptionally activated oncogene cmos. The IAP insertion also disrupts the o part of 
c m« ieV E 1 i humans, as well as in animals, chromosome translocations may place proto-oncogenes into transenp- 
tionaUy active regions of chromatin , where they may be activated. The details of this mechanism have not been worked 
out but It is belilved to occur for c-myc and cibi genes in Burkitt lymphomas and Philadelphia-chromosome positive 
?eukem as respectrvely ?35. 40)J F. Increased amounts of oncogene-specific RNA and protein can also result from an 
excel ^rDNA template fo transcription acquired through oncogene amplification. The present review concent rates 
Sy on this mechanism. G. Mutational^ activated oncogenes have been found in nearly one fifth of human tnaUg- 
^^^ O^eUHA activated by iomatic structural mutations are revealed by transfecUon experiments 
where they are introduced into genetic background of nontumourigenic cultured immortalized cells Several such 
^ansfo mlng oci have been cloned and many* of them belong to the c-ras oncogene family. It should b ! pointed out 
mat both structural mutations and either increased expression or activation of a complementing oncogene may be 
required to achieve a fully tumongemc phenotype (44). 



tumour cells. Although the sampling of tu- 
mours is at present small, the finding of 
known cellular oncogenes among amplified 
DNA represented by dmin:s and HSR:s of 
cancer cells is provocative. Amplification has 
been found to affect at least five out of twenty 
known cellular oncogenes and the degree of 
gene amplification varies from five to many 
hundred-fold over the single haploid copies 
found in normal cells (see also ref. 18). The 
first amplification reported involved the c-myc 



oncogene (see Table 1) in a promyelocyte leu- 
kaemia cell line HL-60 (20, 25). The degree of 
c-myc amplification is between 8—32 fold 
both in the HL-60 cell line and in primary ieu- 
kaemic cells from the patient (20, 25). Origi- 
nal clonal lines of HL-60 were later found to 
contain some dmin:s in culture but their num- 
ber was insufficient to establish any clear cor- 
relation with amplified c-myc. Such a correla- 
tion, however, was discovered for c-myc am- 
plification in a neuroendocrine cell line from 
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DMIN:S 



'i r«ni ^ 20DM . co ' on carcinoma cells. A. The dmin:s are 

) Sf n.^ aS pa . ir t d d< ? ts amon S normal chromosomes in 
... £ s fluorescent, benzimidazole-slained preparation B-D 
I J? "'ration of dmin:s by differential centrifugations. b' 
\\ fte parting material, C. Chromosome fraction, D. Puri- 
»w dimms jDonna George and the author, unpublished 
data and ref. 52J. 

1 

c a eM? 0l ?u Carcinoma ' C0L0 320 I 5 )- In these 
«», the approximately 30-fold amplified 
■myc copies were mapped either to HSR:s of 
. marker chromosome |5, Fig 3B) or to dmin:s 
y 1 ' dependmg on the particular subline stud- 
I Since dminrs were already present in the 
r°r, r c . el,s from this colon "rcino- 
«nJE 1S . hkely that c - m y° ha d also been 
Sifted during growth of the tumour in vivo. 

C„ ly ' am P lified «pies of the c-Ki-ras on- 
y e " e we [ e ma PPed to dmin:s and HSR:s of 
Cs?vf adre , n ? cortical tum <* Yl (74j. An ex- 
ns y e search for changes in other oncogenes 

C H?? U i Cdls h L as since revealed amplifica- 
« that do not show up as dminrs or HSRs 



the G h?p 8 e " eous 'y slamln 8 ^gions (HSR) in 
™ ° "» nded ' HSR-marker chromosome comprise a ma- 
jor portion of both its long and short arms The HSR- 
marker chromosome has evolved from an Xchromosome 
(52 and "npubhshed data of C. C. Lin and the author). B. 

m COLO Jn" f ° f ampH f led < ? pieS 01 the ™* onc °g^ 
HSR « ™ ? cells were found to be located to dmin:s and 
HSK.s. The latter is shown here by in situ-hybridization 
(5, 52). 



Thus for example, the c-myb oncogene is 
amplified in a characteristic marker chromo- 
S ? «™ 3 colon carcinoma without evidence 
of HSR:s (ref. 6, Fig. 4| and in other tumours, 
the amplified c-abl and c-myc oncogene loci 
map to abnormally banding regions (ABR:sl in 
translocated or resident chromosomal seg- 
ments, respectively (59, 76). 

TRANSLOCATIONS AND 
REARRANGEMENTS MAY ACCOMPANY 
ONCOGENE AMPLIFICATION 

The evolution and progression of the karyo- 
type of tumour cells is complicated (see ref. 
68). Concomitant with amplification, DNA se- 
quences acquire an increased mobility in the 
genome with extrachromosomal intermediates 
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TABLE 2 

Sporadic and tumour-specific amplification of cellular oncogenes. ' 



Tumour cells 



Onco- 
gene 



Fold 

amplified 



Chromosomal 
location of 
amplified gene 



Sporadic: 

HL60 (acute prornyelo- c-myc 20 x 
cytic leukaemia, M3} 

COLO320 (colon c-myc 30 x 
carcinoma J 

Yl (adrenocortical c-Ki-ras 50 x 
tumour) 

COLO201/205 (colon c-myb 10 x 
carcinomaj 



K562 (chronic myelo- 
genous leukaemia, CML} 



c-abl 10 x 



8q{ABR} 
dmin, HSR 
dmin, HSR 
marl 

mar(ABRj 



A431 (epidermoid 
carcinoma) 



C erbh 15—20 x n.d. 



MLl— 3 (acute myeloid c-myb 5— lOx 
leukaemia, M2) 

SK BR-3 (breast c-myc 10 x 
carcinoma) 

SEWA {polyoma virus- c-myc 30 x 
induced mouse tumour) 



Lu-65 (lung giant cell 
carcinoma) 

Primary leukemic cells 
from an acute myeloid 
leukemia (M2J patient 



c-myc 8 x 
c-Ki-ras 10 x 
c-myc 33 x 



n.d. 

n.d. 
n.d. 



n.d. 
n.d. 
n.d. 



Tumour-specific: 

small-cell lung cancer c-myc up to 80 x n.d. 
L-myc, 
N-myc 

Neuroblastomas N-myc up to 250 x dmin, HSR 



Glioblastomas 



c-crbB — 



Expression 
elevated 



Remarks 



References 



Yes 
Yes 
Yes 
Yes 

Yes 
Yes 

Yes 

Yes 
Yes 



n : d. 
n.d. 
n.d. 



Yes 



Yes 



Amplification present in 20, 25, 59 

primary ieukaemic cells 

Part of the amplified 4, 5, 52 

c-myc sequences rearranged 

Levels of p 21 c Ki '« 74 

protein elevated 

Patient treated with 4, 6,88 

5-fluorouraciI prior to 
culturing of the tumour cells 
C k coamplified in the 21. 22, 41, 

marker that may be derived 54, 76 
from chromosome 22, c-abl 
protein-associated tyrosine 
kinase activated 

Amplification linked to 82 
chromosome 7 translocation 
and sequence rearrangements 
Amount of protein product, 
the EGF receptor, elevated (see 36} 
Abnormalities of chromo- 34, 61, 91 
some 6q22— 24, where 
c-myb is normally located 

43 

Cells have dmin:s depending Manfred 



on culture conditions; 
c-myc amplification 
correlates with growth 
as a tumour 
At least some copies of 
c-Ki-ras mutated 



Schwab, 
personal 
commu- 
nication 
80 

Unpublished 
data of the 
author and 
A. de la 
Chapelle 



Most amplifications in the 53. 69 
variant phenotype of SCLC 

N-myc also amplified in 14, 48, 72. 
primary tumours of 73.. 75 

advanced grade 

Rearrangements also found Josef 
personal communication Schlessin- 
ger, per- 
sonal com- 
munication 



n.d. = not determined, mar = marker chromosome, M2, M3 refer to the French-American-Brittish classification of 
acute myeloid leukernias. 

* At least one case of oncogene amplification in normal germ-line cells has been found (18). 



visualized as dmin:s, transpositions and trans- 
locations to other chromosomal segments, etc. 
(see 70 for references). There may not be pre- 
ferred chromosomal sites for the apparent 
reintegration of dminrs as HSR;s (75|. In at 



least one case, however, an oncogene may 
have been caught amplifying in situ in its resi- 
dent chromosomal site (59). The finding of 
moderately amplified oncogenes also in chro- 
mosomal sites lacking HSR;s suggests that 
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Fig. 4. Localization of amplified c-myb in COLO 201/205 
cells by in silu hybridization. Shown is a characteristic, 
large marker chromosome (marl) with G-banding (A) 
and associated c-myb autoradiographic grains {BJ. Note 
ihe absence of HSR:s. Marl has probably evolved from 
chromosome number 6, the resident site of the c-myb on- 
cogene in normal cells (34, 88, 91). (Robert Winqvist and 
the author, unpublished data). 



(onco)gene amplification may be more com- 
mon than the structural alterations shown by 
chromosome banding and microscopy (6, 88). 

In at least three cases reported amplification 
has been accompanied by a DNA rearrange- 
ment of the oncogene (5, 20, 82). In the colon 
carcinoma COLO 320 both damaged and nor- 
mal versions of the c-myc gene are amplified 
(5). Although individual cell clones have not 
yet been examined, our unpublished experi- 
ments suggest that the same dmin-containing 
cells harbor and express both normal and re- 
arranged forms of c-myc. The normal version 
of the amplified gene ( however, predominates 
a COLO 320 cells containing HSR:s; the rear- 
ranged version is present only in what ap- 
pears to be a single copy (Fig. 5). In the 
wonic myeloid leukaemia (erythroleukae- 
mia t cell line K562 an amplified DNA segment 
insists of portions of both the c-ab! onco- 
gene and the immunoglobulin Q locus (76). 
lr > both cases abnormal transcripts are prod- 
ded from the rearranged amplified oncogenes 
J'g. 6 and ref. 22). In K562 cells, the abnor- 
mal c-abl oncogene product has also been acti- 
vated cjs a tyrosine protein kinase (41). It is not 
ftown, however, whether structural alter- 
J° ns °f the genes preceded amplification or 

nether they were acquired during the process 
* gene amplification. It seems likely that a 
/•■"ornosomal translocation of c-abl to the 
K5fi9 preceded DNA amplification in the 

*>2 cells, since ail amplified copies were al- 

re arranged (21)r with the change reminis _ 

nt of the Philadelphia translocation |t(9, 22)) 
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27 kbp 



Fig, 5. Amplification and rearrangement of c-myc in 
COLO 320 cells. 10 /tg of cellular DNA was digested with 
Sst I, electrophoresed, blotted and probed with a v-myc 
Pst I fragment (ref. 2, left panel). Fragments of 2.7 kbp 
and 1.4 kbp are seen in both normal and amplified c-myc 
DNA. The 3.3 kbp fragment is derived from a DNA seg- 
ment of unknown origin translocated to the 5' region of 
c-myc with a concomitant deletion of its first exon (unpub- 
lished data of Manfred Schwab and the author). HSF, 
human skin fibroblasts; DM, COLO 320 DM cells; HSR, 
COLO 320 HSR cells. Different amounts of DNA from 
COLO 320 DM cells as indicated were mixed with calf 
thymus DNA to give 24 jig of total DNA, cleaved with 
Sst I, electrophoresed, blotted and probed with a frag- 
ment of 3'human c-myc sequences. The intensities of the 
2.7 kbp c-myc fragment in different samples were com- 
pared to assess its copy number, estimated to be about 
30 (5|. 



found in most chronic myeloid leukaemia tu- 
mours (35, 66—68). Although they have not 
been sequenced, other reported cases of am- 
plified oncogenes are apparently normal on 
basis of mapping with restriction endonu- 
cleases (see Table 2). Therefore we cannot at 
present view mutation as a necessary com- 
panion of oncogene amplification. 



THE MECHANISMS OF GENE 
AMPLIFICATION 

The mechanisms of gene amplification and 
the structure of the amplified DNA have been 
worked out mainly in experimental settings 
involving selection for drug-resistance in cell 
culture (70). Although the mechanisms are 
still incompletely known and may vary in dif- 
ferent cases, some general features have 
emerged. 

A spontaneous degree of illegitimate DNA 
replication seems to exist in normal cells so 
that various segments of DNA are replicated 
more than once during a single cell cycle (37). 
In unselective conditions this DNA is probab- 
ly lost e.g., through formation of micronucleae 
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c-myc 
RNA 
DM HSR 



[)i»» H *0 2 - 3kb 



Fig. 6. Comparison of the electrophoretic mobilities of 
c-myc mRNA:s from COLO 320 DM and HSR cells. The 
size of the normal c-myc mRNA is 2.3 kb. The rearranged 
c-myc locus in DM cells {see Fig. 5) seems to be pre- 
dominantly expressed giving rise to a shortened RNA. 



because the newly synthesised extra copies of 
DNA are not covalently linked to chromo- 
somal DNA of mitotic cells (65, 71). If, how* 
ever, there is a selection pressure to retain an 
increased gene dosage, progressive multipli- 
cation of gene copy number results. The in- 
cidence of cells bearing amplified genes under 
conditions of cytotoxic selection can vary by 
two orders of magnitude and is greatly in- 
creased by the presence of mitogenic sub- 
stances (hormones or tumour promoters) dur- 
ing selection (10, 84, 85) or certain carcino- 
genic or cytotoxic agents before selection (15, 
55, 79, 80, 81, 85). An interesting hypothesis 
suggested by Varshavsky (84, 85) supposes that 
the origins of DNA replication "fire" (initiate 
replication) illegitimately several times during 
a single cell cycle and that this kind of "repli- 
con misfiring" may be increased by substances 
such as tumour promoters and mitogenic hor- 
mones (10, 84, 85). Mariani and Schimke (55) 
point out that most of the cytotoxic agents that 
increase the incidence of gene amplification 
are inhibitors of DNA synthesis. Aberrant 
replication is known to take place after tran- 
sient inhibition of DNA synthesis arid this re- 
sponse can lead to gene amplification (46, 47, 
55, 90). Mitogenic hormones probably increase 
disproportionate DNA replication, but they 



also enhance the colony forming efficiency 0 f 
drug-resistant cells in selective conditions ( 10). 

According to the studies of Axel and his col- 
laborators (65), the multiple cycles of un- 
scheduled DNA replication at a single locus 
during a single cell cycle result in a structure 
schematically outlined in Fig. IF. The hydro- 
gen-bonded amplified copies of DNA depicted 
in Fig. IF must resolve into a tandem linear 
array before the next mitosis. This may well 
occur by homologous recombination between 
any one of several repeated sequences within 
the amplified domain (45, 65). Part of the re- 
combinations would lead to extrachromosomal 
circles possessing an origin for replication (16, 
62); these could be the precursors of dmin:s. 
The unequal recombinations mean that the 
resolved linear structure consists of tandemly 
repeated but heterogeneous units. According 
to Axel's model a gradient of amplification is 
formed so that centrally located sequences are 
amplified more than sequences distal to the 
origin of replication (65). This also has, in fact, 
been found to explain the large, complex DNA 
domain amplified in neuroblastoma cells in 
vivo (38, see also below). 

The chromosomal site of integration of 
transfected genes significantly affects the fre- 
quency and cytogenetic result of their experi- 
mentally induced amplification (83). The 
amplification frequency in some transfor- 
mants has been found to be 100-fold that of 
the others (83). This suggests that there also 
are preferred chromosomal positions for am- 
plification of host cellular genes and that chro- 
mosomal rearrangements may facilitate gene 
amplification by positioning chromosomal se- 
quences in a favorable array. In respect of the 
structural properties of the sequences involved 
in gene amplification, recombinatorially active 
regions have been implicated in experimental 
cases. DNA rearrangements involving restric- 
tion fragment length polymorphisms and 
variation in gene copy number have been de- 
tected in the human genome between clusters 
of short repetitive interspersed DNA sequences 
called Alu family DNA-sequences (17). Such 
inter-Alu sequences have also been detected 
in an extrachromosomal DNA form, including 
covalently closed circles (17, 78), The copy 
number of inter-Alu sequences apparently va- 
ries in an age- and tissue-specific manner (17, 
78), but any comprehensive analysis of the 
phenomenon in human tumours is not yet 
available. It is also not yet clear whether these 
kinds of repetitive sequences are involved in 
generating amplified oncogene sequences in 
dmin:s or HSR:s in tumours. 
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Fig 8 A. Indirect immunofluorescence for p2l<*"> in Yl 
DM cells. Similar fluorescence of the plasma membrane 
wasobtained for the Yl HSR cells. Inset (b) shows control 
staining with normal rat serum and inset (d 
normal rat kidney cells with the monoclonal ant.body 
against p21. 



NRS anti-p21 



Fig. 7. Elevated levels of the protein in Yl cells 

(741 The Yl DM and HSR cells which harbor a 50-fold 
'amplified c-Ki-ras oncogene (74) J^^^ 1 ^^ 
labeled with pSJ-methionine and the p21<*^ protein 
warfl-noprecipitated, as detailed j7*K^™J 
tat serum iNRSj or rat monoclonal anli-p21 serum. The 
proteins were electrophoresed in a 15 % polyacrylamide 
gel in the presence of SDS. In addition to a major p21 

. band, a labeled band at about 16 kd was present in the 
'immunoprccipitates. The amount of radioactivity in p21 
was about 50 fold that in normal rat kidney cells. The 
Kristen sarcoma virus-transformed rat kidney cells [ob- 

| tained from the American Tissue Culture Collection) also 
■yielded unexpectedly low amounts of the v-Ki-ras 

U protein. 



CARCINOGEN-INDUCED GENE 
'AMPLIFICATION AND CLONAL 
SELECTION OF CANCER CELLS 

Although cell sorting experiments have 
|. shown a basal spontaneous rate of gene ampli- 
fication in eukaryotic cells (37), this can be 
^ ^creased severalfold by metabolic inhibitors 
<* cytotoxic agents (15, 37, 70, 81, 85). In 
^any respects the latter response is reminis- 
cent of the so-called SOS-response elicited in 
Wteriae by noxious stimuli (see 28). In a te- 
kological context, the rapid induction of gene 
amplification that apparently occurs frequent- 
ly through extrachromosomal intermediates 
m ay provide cells with genetic material for 
^sequent selective pressures operating in 
Armful conditions (60). In cancer cells, the 
Mechanism may enhance the emergence of 



clonal populations of cells with increasingly 
malignant properties (58, 60). Such genetic 
instability of cancer cells is clearly enhanced, 
leading to the rapid evolution of increasingly 
malignant tumour cell populations (19, 58). A 
serious question of practical importance is 
whether drug resistance in treated patients 
also selects cells that have an enhanced ability 
to amplify (onco)genes important for growth 
and progression of the tumour (84, 85). It is 
also possible that some of the carcinogenic in- 
sults caused by mutagens are only expressed 
as a result of subsequent amplification events 
induced by tumour promoters (84, 85) or facil- 
itated by hormones in, say replicating epithelial 
cells (10). The persistence of dmin:s in some 
tumours suggests that there is a selection 
pressure for their retention (8, 9, 11, 
Amplified DNA in dmims must contain an 
origin for DNA replication (62) and must be 
selected for in daughter cell populations, 
where it is unevenly segregated (71), In the 
' absence of such a selection pressure dmin:s 
are lost (71). In at least one study the length of 
an HSR has been found to increase during a 
selection of malignant cells for enhanced 
tumourigenicity (30). 

The amplified c-erbB gene in A431 cells codes 
for epidermal growth factor receptor (27). The 
abundant amounts of receptor protein on 
A431 cell surface may, however, provide the 
cells with an abnormal growth response (31). 
A naive supposition is that the amplified se- 
quences in dminrs and possibly in HSR:s of 
tumours contain growth-promoting genes {see 
36 for references). This seems to fit well with 
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Fig. 9. Construction of a v-myc expression vector. A synthetic linker iATGAGAATTCATCAT] containing a translational 
initiation codon was inserted downstream from the trp promoter in the pSRC ex 16 Rl expression vector described 
previously {see ref. 3J. Approximately one "half of the v-src sequences coding for the aminotermtnal portion of pp60 Mf ' 
protein were then deleted and the remaining portion ligated in translational codon frame with the synthetic ATG. A 
Hinc H fragment of v-myc from plasmid clone MC 38 (nucleotides 320—685 in the v-myc sequence in ref. 2) was ligated 
downstream form remaining v-src sequences in continuity with its reading frame. The resulting product contained 3 
amino acids from the synthetic linker, 252 amino acids encoded by the 756 base pair fragment from Sma ! to Pvu II 
restriction sites in v-src DNA, 122 amino acids from the v-myc and 6 amino acids [corresponding to nucleotides 

2968-2085} from the pBR322 vector (3). 
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I recent findings on amplified oncogenes, though 
in many cases the search for an amplified on- 
cogene is still continuing. Even positive find- 
ings do not mandate a role for amplified eel- 

Jlular oncogenes, however, because the do- 
main of amplified DNA is inevitably much 
larger than a single genetic locus (e.g. 38). 



ENHANCED EXPRESSION OF AMPLIFIED 
ONCOGENES 

In all cases where they have been studied, the 
amplified oncogenes have been found abund- 
antly expressed at the RNA level, roughly in 
proportion to the amount of DNA amplifica- 
tion {see Table 1). Described cases of elevated 
RNA expression include examples of abnor- 
mal (5, 22) and ectopic (6) transcription. In at 
least four cases this enhancement is not li- 
mited to synthesis of RNA (31, 33, 41, 74, 82). 
The Yl cells that have amplified c-Ki-ras con- 
tain exceptionally large amounts of its protein 
product situated on the plasma membrane 
(ref." 74, Fig. 7 and 8). High amounts of the 
| c-myc encoded protein are also found in 
COLO 320 cells that have amplified the gene 
' (33). The myc oncogenes have recently been 
shown to encode nuclear proteins (ref. 1, 3, 
26, 29, 32, 33, Fig. 9-ll|. Both the expression 
of the c-myc mRNA (39) and the subcel- 
lular localization of myc proteins are linked to 
the cell cycle (ref. 89, Fig. 12). It may be that 
elevated expression of specific c-myc func- 
tions is necessary for cell cycle progression 
and the growth transformation aspect of the 
phenotype of cancer cells that may contribute 
to tumour progression (7, 36). Elevated ex- 
j pression of c-myc has been shown to replace 
|n part platelet-derived growth factor in in- 
duction of competence for DNA replication 
|7). Generally, enhanced expression of an on- 
cogene could be a necessary prerequisite for 
acquisition of a growth advantage by cells 
having extra copies of the gene. This effect 
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Fig. 10. E. coli 294 was iransfecled with the hybrid v-src, 
v-myc plasmid outlined in Fig. 9 and ampicillin-resistant 
bacterial colonies were checked for the production of a 
43,000 m.w. bacterial v-myc protein after induction by 
growth to different optical densities in minimal essential 
medium |M9, induction + ) or complete medium (LB, in- 
duction— J {3). 



could also be the principal contribution of 
amplification to tumourigenesis. 



TUMOUR CELL AND STAGE SPECIFICITY 
OF ONCOGENE ACTIVATION AND 
AMPLIFICATION 

Tumour cell specificity of oncogene amplifi-- 
cation has been found in three malignancies. 
The c-myc, L-myc or N-myc oncogene is ampli- 
fied in most cases of the variant form of small- 
cell lung cancer cells (53 f 69), c-erbB is ampli- 
fied in several glioblastomas (Josef Schlessin- 
ger, personal communication] and the putative 
N-myc oncogene is amplified in about half of 
grade III and IV neuroblastomas (14, 72, 73, 
75). In addition to HSR:s, small-cell lung can- 
cers and neuroblastomas frequently show a 
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Fig. 12. Fluorescent staining for DNA and myc protein in 
myelocytomatosis virus-transformed quail cells. In in- 
terphase cells, the myc protein is confined to the nucleae. 
In the mitotic cell, myc fluorescence is distributed 
throughout the cell unlike fluorescence for chromatin, 
which is compacted to chromosomes in the metaphase 
plate. In fact, there is less myc fluorescence associated 
with chromatin than with the rest of the cell. DAPI, dia- 
minophenyl indole DNA stain. The anti-myc protein rab- 
bit antiserum was used in a 1/200 dilution (ref. 89). 



deletion of a portion of the short arm of chro- 
mosome 1 (13) and chromosome 3 (86, 87), 
respectively, in karyological examination. 
Two kinds of changes have also been described 
in different neuroblastoma oncogenes. The 
first is a mutation in the N-ras gene, an ac- 
tivated oncogene that was discovered because 
of its relation to other ras genes and trans- 
forming activity in transfection .experiments 
(77). The second is amplification of a distant 
homologue of the c-myc gene called N-myc [72, 
73, 75). Although the transforming potential 
of the U-myc gene has not yet been established, 
its consistent presence in a core segment of 
amplified neuroblastoma DNA (38, 57, 72, 73, 
75) and its elevated expression in most reti- 
noblastomas (48) suggests its oncogenic nature. 

Taya et al. (80) have recently described a 
human lung giant cell carcinoma grown in 
nude mice, where both c-Ki-ras and c~myc on- 



cogenes were amplified about 10-fold. Besides 
sequencing studies indicated that at least 
some of the amplified c-Ki-ras copies were 
also mutationaily activated in the 12th codon. 
These results fit to the multistage theory of 
cancer development and progression (see 58). 
Apparently co-operating lesions in cellular on- 
cogenes accumulate during tumour growth 
and selection and increase the malignant po- 
tential of the tumour cells (44). 

When does oncogene amplification come in- 
to play during tumourigenesis? Gene amplifi. 
cation may not be any initiating event in carci- 
nogenesis. Amplification and enhanced ex- 
pression of Q-myc and N-myc may occur during 
the progression of human carcinoma of the 
lung and neuroblastoma cells to a more malig- 
nant phenotype (14, 53, 73). There may be, 
however, no mandatory sequence of onco- 
gene amplifications for the genesis of any par- 
ticular tumor. Amplification of an oncogene 
could play its part in malignant progression of 
already initiated cells whenever it happened 
to occur. 
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^acts. If these minor cell proteins differ among cells to the same extent as the 
-lore abundant proteins, as is commonly assumed, only a small number of pro- 
ton differences (perhaps several hundred) suffice to create very large differences 
Ijj cell morphology and behavior. 

A Cell Can Change the Expression of Its Genes 
in Response to External Signals 3 

Most of the specialized cells in a multicellular organism are capable of altering 
their patterns of gene expression in response to extracellular cues. If a liver cell 
is exposed to a glucocorticoid hormone, for example, the production of several 
specific proteins is dramatically increased. Glucocorticoids are released during 
periods of starvation or intense exercise and signal the liver to increase the 
production of glucose from amino acids and other small molecules; the set of 
proteins whose production is induced includes enzymes such as tyrosine amino- 
transferase, which helps to convert tyrosine to glucose. When the hormone is no 
longer present, the production of these proteins drops to its normal level. 

Other cell types respond to glucocorticoids in different ways. In fat cells, for 
example, the production of tyrosine aminotransferase is reduced, while some 
other cell types do not respond to glucocorticoids at all. These examples illustrate 
a general feature of cell specializatiorv^-different cell types often respond in dif- 
ferent ways to the same extracellular signal. Underlying this specialization are 
features that do not change, which give each cell type its permanently distinc- 
tive character. These features reflect the persistent expression of different sets of 
genes. 



Gene Expression Can Be Regulated at Many of the Steps 
in the Pathway from DNA to RNA to Protein 4 

If differences between the various cell types of an organism depend on the par- 
ticular genes that the cells express, at what level is the control of gene expression 
exercised? There are many steps in the pathway leading from DNA to protein, and 
all of them can in principle be regulated. Thus a cell can control the proteins it 
makes by (1) controlling when and how often a given gene is transcribed (tran- 
scriptional control), (2) controlling how the primary RNA transcript is spliced or 
otherwise processed (RNA processing control), (3) selecting which completed 
mRNAs in the cell nucleus are exported to the cytoplasm {RNA transport con- 
trol), (4) selecting which mRNAs in the cytoplasm are translated by ribosomes 
(translational control), (5) selectively destabilizing certain mRNA molecules in 
the cytoplasm (mRNA degradation control), or (6) selectively activating, inacti- 
vating, or compartmentalizing specific protein molecules after they have been 
made (protein activity control) (Figure 9-2). 

For most genes transcriptional controls are paramount. This makes sense 
because, of all the possible control points illustrated in Figure 9-2, only transcrip- 
tional control ensures that no superfluous intermediates are synthesized. In the 




^control 



Figure 9-2 Six steps at which 
eucaryote gene expression can be 
controlled. Only controls that operate 
at steps 1 through 5 are discussed in 
this chapter. The regulation of protein 
activity (step 6) is discussed in 
Chapter 5; this includes reversible' 
activation or inactivation by protein 
phosphorylation as well as 
irreversible inactivation by proteolytic 
degradation. 
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following sections we discuss the DNA and protein components that regulate the 
initiation of gene transcription. We return at the end of the chapter to the other 
ways of regulating gene expression. 

Summary 

The genome of a cell contains in its DNA sequence the information to make many 
thousands of different protein and RNA molecules. A cell typically expresses only a 
fraction ofits genes, and the different types of cells in multicellular organisms arise 
because different sets of genes are expressed. Moreover, cells can change the pattern 
of genes they express in response to changes in their environment, such as signals from 
other cells. Although all of the steps involved in expressing a gene can in principle be 
regulated, for most genes the initiation of RNA transcription is the most important 
point of control 



DNA-binding Motifs in Gene 
Regulatory Proteins 5 

How does a cell determine which ofits thousands of genes to transcribe? As dis- 
cussed in Chapter 8, the transcription of each gene is controlled by a regulatory 
region of DNA near the site where transcription begins. Some regulatory regions 
are simple and act as switches that are thrown by a single signal. Other regula- 
tory regions are complex and act as tiny microprocessors, responding to a vari- 
ety of signals that they interpret and integrate to switch the neighboring gene on 
or off. Whether complex or simple, these switching devices consist of two fun- 
damental types of components: (1) short stretches of DNA of defined sequence 
and (2) gene regulatory proteins that recognize and bind to them. 

We begin our discussion of gene regulatory proteins by describing how these 
proteins were discovered. 

Gene Regulatory Proteins Were Discovered Using 
Bacterial Genetics 6 

Genetic analyses in bacteria carried out in the 1950s provided the first evidence 
of the existence of gene regulatory proteins that turn specific sets of genes on 
or off. One of these regulators, the lambda repressor, is encoded by a bacterial 
virus, bacteriophage lambda. The repressor shuts off the viral genes that code for 
the protein components of new virus particles and thereby enables the viral ge- 
nome to remain a silent passenger in the bacterial chromosome, multiplying with 
the bacterium when conditions are favorable for bacterial growth (see Figure 
6-80). The lambda repressor was among the first gene regulatory proteins to be 
characterized, and it remains one of the best understood, as we discuss later. 
Other bacterial regulators respond to nutritional conditions by shutting off genes 
encoding specific sets of metabolic enzymes when they are not needed. The lac 
repressor, for example, the first of these bacterial proteins to be recognized, turns 
off the production of the proteins responsible for lactose metabolism when this 
sugar is absent from the medium. 

The first step toward understanding gene regulation was the isolation of 
mutant strains of bacteria and bacteriophage lambda that were unable to shut 
off specific sets of genes. It was proposed at the time, and later proved, that most 
of these mutants were deficient in protfeins acting as specific repressors for these 
sets of genes. Because these proteins, like most gene regulatory proteins, are 
present in small quantities, it was difficult and time-consuming to isolate them. 
They were eventually purified by fractionating cell extracts on a series of stan- 
dard chromatography columns (see pp. 166-169). Once isolated, the pro- 
teins were shown to bind to specific DNA sequences close to the genes that they 
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Figure 9-3 Double-helical structure 
of DNA. The major and minor grooves 
on the outside of the double helix, arc 
indicated. The atoms are colored as 
follows: carbon, dark blue; nitrogen* 
light blue; hydrogen, white; oxffiP 1 * 
red; phosphorus, yellow. 
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Figure 9-71 A mechanism to explain 
both the marked deficiency of CG 
sequences and the presence of CG 
islands in vertebrate genomes. A 

black line marks the location of an 
unmethylated CG dinucleotide in the 
DNA sequence, while a red line marks 
the location of a methylated CG 
dinucleotide. 
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Summary 

The many types of cells in animals and plants are created largely through mecha- 
nisms that cause different genes to be transcribed in different cells. Since many spe- 
cialized animal cells can maintain their unique character when grown in culture, the 
gene regulatory mechanisms involved in creating them must be stable once estab- 
lished and heritable when the cell divides, endowing the cell with a memory of its 
developmental history. Procaryotes and yeasts provide unusually accessible model 
systems in which to study gene regulatory mechanisms, some of which may be rel- 
evant to the creation of specialized cell types in higher eucaryotes. One such mecha- 
nism involves a competitive interaction between two (or more) gene regulatory pro- 
teins, each of which inhibits the synthesis of the other; this can create a flip-flop 
switch that switches a cell between two alternative patterns of gene expression. Di- 
rector indirect positive feedback loops, which enable gene regulatory proteins to 
perpetuate their own synthesis, provide a general mechanism for cell memory. 

In eucaryotes gene transcription is generally controlled by combinations of gene 
regulatory proteins. It is thought that each type of cell in a higher eucaryotic organism 
contains a specific combination of gene regulatory proteins that ensures the expres- 
sion of only those genes appropriate to that type of cell A given gene regulatory pro- 
tein may be expressed in a variety of circumstances and typically is involved in the 
regulation of many genes. 

In addition to diffusible gene regulatory proteins, infierited states of chromatin 
condensation are also utilized by eucaryotic cells to regulate gene expression. In ver- 
tebrates DPiA methylation also plays a part, mainly as a device to reinforce decisions 
about gene expression that are made initially by other mechanisms. 



Posttranscriptional Controls 

Although controls on the initiation of gene transcription are the predominant 
form of regulation for most genes, other controls can act later in the pathway 
from RNA to protein to modulate the amount of gene product that is made. Al- 
though these posttranscriptional controls, which operate after RNA polymerase 
has bound to the gene's promoter and begun RNA synthesis, are less common 
toan transcriptional control lot many genes they are crucial. It seems that every 
step in gene expression that could.be controlled in principle is likely to be regu- 
ated under some circumstances for some genes. 

We consider the varieties of posttranscriptional regulation in temporal or- 
er j acc ording to the sequence of events that might be experienced by an RNA 
m °lecnl e after its transcription has begun (Figure 9-72]. 
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Figure 6-3 Genes can be expressed 
with different efficiencies. Gene A is 
transcribed and translated much more 
efficiently than gene B.This allows the 
amount of protein A in the cell to be 
much greater than that of protein B. 
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FROM DNATO RNA 

Transcription and translation are the means by which cells read out, or express, 
the genetic instructions in their genes. Because many identical RNA copies can 
be made from the same gene, and each RNA molecule can direct the synthesis 
of many identical protein molecules, cells can synthesize a large amount of 
protein rapidly when necessary. But each gene can also be transcribed and 
translated with a different efficiency, allowing the cell to make vast quantities of 
some proteins and tiny quantities of others (Figure 6-3). Moreover, as we see in 
the next chapter, a cell can change (or regulate) the expression of each of its 
genes according to the needs of the moment — most obviously by controlling 
the production of its RNA. 



Portions of DNA Sequence Are Transcribed into RNA 

The first step a cell takes in reading out a needed part of its genetic instructions 
is to copy a particular portion of its DNA nucleotide sequence — a gene — into an 
RNA nucleotide sequence. The information in RNA, although copied into another 
chemical form, is still written in essentially the same language as it is in DNA — 
the language of a nucleotide sequence. Hence the name transcription. 

Like DNA, RNA is a linear polymer made of four different types of nucleotide 
subunits linked together by phosphodiester bonds (Figure 6-4). It differs from 
DNA chemically in two* respects; (1) the nucleotides in RNA are 
ribonucleotides — that is, they contain the sugar ribose (hence the name ribonu- 
cleic acid) rather than deoxyribose; (2) although, like DNA, RNA contains the 
bases adenine (A), guanine (G), and cytosine (C), it contains the base uracil (U) 
instead of the thymine (T) in DNA. Since U, like T, can base-pair by hydrogen- 
bonding with A (Figure 6-5), the complementary base-pairing properties 
described for DNA in Chapters 4 and 5 apply also to RNA (in RNA, G pairs with 
C, and A pairs with U). It is not uncommon, however, to find other types of base 
pairs in RNA; for example, G pairing with U occasionally. 

Despite these small chemical differences, DNA and RNA differ quite dra- 
matically in overall structure. Whereas DNA always occurs in cells as a double- 
stranded helix, RNA is single-stranded. RNA chains therefore fold up into a 
variety of shapes, just as a polypeptide chain folds up to form the final shape of 
a protein (Figure 6--6). As we see later in this chapter, the ability to fold into com- 
plex three-dimensional shapes allows some RNA molecules to have structural 
and catalytic functions. 



Transcription Produces RNA Complementary to 
One Strand of DNA 

All of the RNA in a cell is made by DNA transcription, a process that has cer- 
tain similarities to the process of DNA replication discussed in Chapter 5. 
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Figure 6-89 Protein aggregates that cause human disease. (A) Schematic illustration of the type of 
conformational change in a protein that produces material for a cross-beta filament (B) Diagram illustrating 
the self-infectious nature of the protein aggregation that Is central to prion diseases. PrP is high!/ unusual 
because the misfolded version of the protein, called PrP*, induces the normal PrP protein tt contacts- to 
change its conformation, as shown. Most of the human diseases caused by protein aggregation are caused by 
the overproduction of a variant protein that is especially prone to aggregation, but because this structure Is 
not infectious in this way, it cannot spread from one animal to another, (C) Drawing of a cross-beta filament, 
a common type of protease-resistant protein aggregate found in a variety of human neurological diseases. 
Because the hydrogen-bond interactions in a [J sheet form between polypeptide backbone atoms (see Figure 
3-9), a number of different abnormally folded proteins can produce this structure. (D) One of several 
possible models for the conversion of PrP to PrP*, showing the likely change of two a-helices into four 
p-strands. Although the structure of the normal protein has been determined accurately, the structure of the 
infectious form is not yet known with certainty because the aggregation has prevented the use of standard 
structural techniques. (C, courtesy of Louise Serpell, adapted from M. Sunde et al.,J. AW. Biol 273:729-739, 
1 997; D, adapted from S.B. Prusiner, Trends Biochcm. So. 2 1 :482-487, 1 996.) 

animals and humans. It can be dangerous to eat the tissues of animals that con- 
tain PrP*, as witnessed most recently by the spread of BSE (commonly referred 
to as the "mad cow disease") from cattle to humans in Great Britain. 

Fortunately, in the absence of PrP*, PrP is extraordinarily difficult to convert 
to its abnormal form. Although very few proteins have the potential to misfold 
into an infectious conformation! a similar transformation has been discovered 
to be the cause of an otherwise mysterious "protein-only inheritance" observed 
in yeast cells. 

There Are Many Steps From DN A to Protein 

We have seen so far in this chapter that many different types of chemical reac- 
tions are required to produce a properly folded protein from the information 
contained in a gene (Figure 6-90). The final level of a properly folded protein in 
a cell therefore depends upon the efficiency with which each of the many steps 
is performed. 

We discuss in Chapter 7 that cells have the ability to change the levels of 
their proteins according to their needs. In principle, any or all of the steps in Fig- 
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Figure 6-90 The production of a 
protein by a eucaryotk cell. The final 
level of each protein in a eucaryotic cell 
depends upon the efficiency of each step 
depicted. 
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ure 6-90) couid be regulated by the cell for each individual protein. However, as 
we shall see in Chapter 7, the initiation of transcription is the most common 
point for a cell to regulate the expression of each of its genes. This makes sense, 
inasmuch as the most efficient way to keep a gene from being expressed is to 
block the very first step— the transcription of its DNA sequence into an RNA 
molecule. 



Summary 

The translation of the nucleotide sequence of an mRNA molecule into protein takes 
place in the cytoplasm on a large ribonucleoprotein assembly called a ribosome. The 
amino acids used for protein synthesis are first attached to a family of tRNA 
molecules, each of which recognizes, by complementary base-pair interactions, par- 
ticular sets of three nucleotides in the mRNA (codons). The sequence of nucleotides in 
the mRNA is then read from one end to the other in sets of three according to the 
genetic code. 

To initiate translation, a small ribosomal subunit binds to the mRNA molecule 
at a start codon (AUG) that is recognized by a unique initiator tRNA molecule. A 
large ribosomal subunit binds to complete the ribosome and begin the elongation 
phase of protein synthesis. During this phase, aminoacyl tRNAs — each bearing a 
specific amino acid bind sequentially to the appropriate codon in mRNA by forming 
complementary base pairs with the tRNA anticodon. Each amino acid is added to the 
C-terminal end of the growing polypeptide by means of a cycle of three sequential 
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Figure 7-5 Six steps at which 
eucaryotic gene expression can be 
controlled* Controls that operate at 
steps I through 5 are discussed in. this 
chapter. Step 6, the regulation of protein 
activity, includes reversible activation or 
inactivation by protein phosphorylation 
(discussed in Chapter 3) as well as 
irreversible inactivation by proteolytic 
degradation (discussed in Chapter 6). 



Gene Expression Can Be Regulated at Many of the Steps 
in the Pathway from DNA to RNA to Protein 

If differences among the various cell types of an organism depend on the partic- 
ular genes that the cells express, at what level is the control of gene expression 
exercised? As we saw in the last chapter, there are many steps in the pathway 
leading from DNA to protein, and all of them can in principle be regulated. Thus 
a cell can control the proteins it makes by (1) controlling when and how often a 
given gene is transcribed (transcriptional control), (2) controlling how the RNA 
transcript is spliced or otherwise processed (RNA processing control), (3) 
selecting which completed mRNAs in the cell nucleus are exported to the cytosol 
and determining where in the cytosol they are localized (RNA transport and 
localization control), (4) selecting which mRNAs in the cytoplasm are translated 
by ribosomes (translational control), (5) selectively destabilizing certain mRNA 
molecules in the cytoplasm (mRNA degradation control), or (6) selectively acti- 
vating, inactivating, degrading, or compartmentalizing specific protein 
molecules after they have been made (protein activity control) (Figure 7-5). 

For most genes transcriptional controls are paramount This makes sense 
because, of all the possible control points illustrated in Figure 7-5, only tran- 
■ scriptional control ensures that the cell will not synthesize superfluous interme- 
diates. In the following sections we discuss the DNA and protein components 
that perform this function by regulating the initiation of gene transcription. We 
% shall return at the end of the chapter to the additional ways of regulating gene 
expression. 

Summary 

III genome of a cell contains in its DNA sequence the information to make many 
thousands of different protein and RNA molecules. A cell typically expresses only a 
fraction of its genes, and the different types of cells in multicellular organisms arise 
because different sets of genes are expressed. Moreover, cells can change the pattern 
of genes they express in response to changes in their environment, such as signals 
from other cells. Although all of the steps involved in expressing a gene can in prin- 
ciple be regulated, far most genes the initiation of RNA transcription is the most 
. important point of control 

DNA-BINDING MOTIFS IN GENE REGULATORY 
PROTEINS 

How does a cell determine which of its thousands of genes to transcribe? As 
I mentioned briefly in Chapters 4 and 6, the transcription of each gene is con- 
I trolled by a regulatory region of DNA relatively near the site where transcription 
I begins. Some regulatory regions are simple and act as switches that axe thrown 
gby a single signal. Many others axe complex and act as tiny microprocessors, 
I responding to a variety of signals that they interpret and integrate to switch the 
Neighboring gene on or off. Whether complex or simple, these switching devices 
si' 
is 
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occur in the germ line, the cell lineage that gives rise to sperm or eggs. Most of 
the DNA in vertebrate germ cells is inactive and highly methylated. Over long 
periods of evolutionary time, the methylated CG sequences in these inactive 
regions have presumably been lost through spontaneous deamination events 
that were not properly repaired. However promoters of genes that remain active 
in the germ cell lineages (including most housekeeping genes) are kept 
unmethylated, and therefore spontaneous deaminations of Cs that occur with- 
in them can be accurately repaired. Such regions are preserved in modern day 
vertebrate cells as CG islands. In addition, any mutation of a CG sequence in the 
genome that destroyed the function or regulation of a gene in the adult would be 
selected against, and some CG islands are simply the result of a higher than nor- 
mal density of critical CG sequences. 

The mammalian genome contains an estimated 20,000 CG islands. Most of 
the islands mark the 5' ends of transcription units and thus, presumably, of 
genes. The presence of CG islands often provides a convenient way of identify- 
ing genes in the DNA sequences of vertebrate genomes. 

Summary 

The many types of cells in animals and plants are created largely through mecha- 
nisms that cause different genes to be transcribed in different cells. Since many 
specialized animal cells can maintain their unique character through many cell 
division cycles and even when grown in culture, the gene regulatory mechanisms 
involved in creating them must be stable once established and heritable when the 
cell divides. These features endow the cell with a memory of its developmental history. 
Bacteria and yeasts provide unusually accessible model systems in which to study 
gene regulatory mechanisms* One such mechanism involves a competitive interac- 
tion between two gene regulatory proteins, each of which inhibits the synthesis of the 
other; this can create a flip-flop switch that switches a cell between two alternative 
patterns of gene expression. Direct or indirect positive feedback loops, which enable 
gene regulatory proteins to perpetuate their own synthesis, provide a general mech- 
anism for cell memory Negative feedback loops with programmed delays form the 
basis far cellular clocks. 

In eucaryotes the transcription of a gene is generally controlled by combinations 
of gene regulatory proteins. It is thought that each type of cell in a higher eucaryotic 
organism contains a specific combination of gene regulatory proteins that ensures 
the expression of only those genes appropriate to that type of cell A given gene regu- 
latory protein may be active in a variety of circumstances and typically is involved 
in the regulation of many genes. 

In addition to diffusible gene regulatory proteins, inherited states of chromatin 
condensation are also used by eucaryotic cells to regulate gene expression. An espe- 
cially dramatic case is the inactivation of an entire X chromosome in female mam- 
mals. In vertebrates DNA methylation also Junctions in gene regulation, being used 
mainly as a device to reinforce decisions about gene expression that are made ini- 
tially by other mechanisms. DNA methylation also underlies the phenomenon of 
genomic imprinting in mammals, in which the expression of a gene depends on 
whether it was inherited from the mother or the father. 
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Figure 7-86 A mechanism to explain 
both the marked overall deficiency 
of CG sequences and their clustering 
into CG islands in .vertebrate 
genomes. A black fine marks the location 
of a CG dinucleotide in the DNA 
sequence, while a red "lollipop" indicates 
the presence of a methyl group on the 
CG dinucleotide. CG sequences that lie in 
regulatory sequences of genes that are 
transcribed in germ cells are unmethylated 
and therefore tend to be retained In 
evolution. Methylated CG sequences, on 
the other hand, tend to be lost through 
deamination of 5-methyi C toT, unless the 
CG sequence is critical for survival. 



POSTTRANSCRIPTIONAL CONTROLS 

In principle, every step required for the process of gene expression could be 
controlled. Indeed, one can find examples of each type of regulation, although 
any one gene is likely to use only a few of them. Controls on the initiation of 
gene transcription are the predominant form of regulation for most genes. But 
other controls can act later in the pathway from DNA to protein to modulate 
the amount of gene product that is made. Although these posttxanscriptional 
controls, which operate after RNA polymerase has bound to the gene's promoter 
and begun RNA synthesis, are less common than transcriptional control for 
many genes they are crucial. 
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CHAPTER 29 

Regulation of transcription 
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jlte phenotypic differences that distinguish the 
various kinds of cells in a higher eukaryole are 
largely due to differences in the expression of 
ff nes that code Tor proteins, that is. those tran- 
scribed by RNA polymerase II. In principle, the 
expression of these genes might be regulated at 
a ny one ol" several stages. The concept of ihe 
-level or control" implies that gene expression 
is not necessarily an automatic process once it 
has begun. It could he regulated in a gene- 
specific nay al any one <>r several sequential 
steps. We can distinguish (at least) five poten- 
tial control points, forming the series; 

Activation of gene structure 
I 

'Initiation of transcript inn 
I 

Processing ihe transcript 
I 

Transport, to cvioplasm 
I 

Translation or in K.N A 

The existence of Ihe first step is implied by 
the discovery that genes may exist in either oV 
two structural conditions, Relative to ihe stale 
of most or Ihe genome, genes are found in 
iin -active" slate in Ihe cells in which they 
are expressed (see Chapter 27). The change or 
structure is distinct from Ihe act or transcrip- 
tion, and indicates that the gene is nranscrib- 
able." This suggesls that acquisition of the 
-active" structure must be Ihe first step in gene 
expression. 

Transcription of a gene in the active state is 



controlled at the stage of initiation, that is. by 
the interaction of RNa polymerase with its pro- 
moter. This is now becoming susceptible to 
analysis in the in vittxt systems (see Chapter 
28). For most genes, this is a major control 
point: probably it is the most common level of 
regulation. 

There is at present no evidence for control 
at subsequent stages or transcription in eukary- 
otic cells, for example, via anlitermination 
mechanisms. 

The primary transcript is modified by capping 
at the 5' end, and usually also by po I \ ademp- 
tion at the 3' end. Introns must be spliced out 
from the transcripts of interrupted genes. The 
mature UNA must be exported from, the nucleus 
to the cytoplasm. Regulation of gene expression 
by selection or sequences at the lever of nuclear 
RNA might involve any or all of these stages, 
but the one for which we have most evidence 
concents changes in splicing: some genes are 
expressed by means of alternative splicing pat- 
terns whose regulation controls the type or pro- 
tein product (see Chapter M>). 

Finally. Ihe translation or an in RNA in the cyto- 
plasm can be specifically controlled. There is little 
evidence for ihe employment of this mechanism in 
adult somatic cells, but it does occur in some 
embryonic situations, as described in Chapter 7. 
The mechanism is presumed to involve the block- 
ing or initiation or translation of some mRNAs by 
specific protein factors. 

But having acknowledged that control of gene 
expression can occur al multiple stages, and 
that production or RNA cannot inevitably be 
equated with production or protein, it is clear 
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that the overwhelming majority or regulatory 
events occur at^the- initiation of transcription. 
Regulation of tissue-specific gene transcription 
lies at the heart of eukaryotic differentiation; 
indeed, we see examples in Chapter 38 in 
which proLeins that regulate embryonic devel- 
opment prove to be transcription factors. A reg- 
ulatory transcription factor serves lo provide 



common control of a large number of target 
genes, and we seek to answer two questions 
about this mode of regulation: what identifies 
the common target genes to the transcription 
factor; and how is the activity of the transcrip- 
tion factor itself regulated in response to intrin- 
sic or extrinsic signals? 



Response elements identify genes under common 
regulation 



The principle that emerges from characterizing 
groups of genes under common control is that 
they' share a pivmoter element that is recognized 
by a regulatory tmnsaiption factor. An element 
that causes a" gene to- respond to such a factor 
is called a response element; examples are the 
HSE (heat shock response element), GRE 
(glucocorticoid response element), SHE (serum 
response element). 

The properties of some inducible transcription 
factors and the elements that they recognize are 
summarized in. Table 29.1. Response elements 
have the same general characteristics as 
upstream elements of promoters or enhancers. 
They contain short consensus sequences, and 
copies of the response elements found in dif- 
ferent genes are closely related, but not neces- 
sarily identical. The region bound by the factor 
extends for- a short distance on either side of 



Table 29.1 I nducible transcription f actors bind to 
response elements that identify groups of promoters 
or enhancers subject to coordinate control. 



Regulatory Agent Module Consensus 



Factor 



Heat shock HSE CNNGAANNTCCNNG HSTF 

Glucocorticoid GRE TGGTACAAATGTTCT Receptor 

Phorbol ester TRE TGACTCA API 

Serum SHE CCATATTAGG SRF 



the consensus sequence. In promoters, the ele- 
ments are not present at fixed distances from 
the 5tartpoim, but are usually <200 bp upstream 
of it. The presence of a single element usually 
is sufficient to confer the regulatory response, 
but sometimes there are multiple copies. 

Response elements may be located in P 10 " 
•moters or in enhancers. Some types of element 
are topically found in one rather than the oth^ 
usually an HSE is found in a promoter, white » 
GRE is found in an enhancer. We assume tM 
all response elements function by the samf 
general principle. A gene is regulated by a 
sequence at tlie promoter or enhancer thai & 
recognized by a specific protein. The P 7 ^^ 
Junctions as a transcription factor needed J* 
RNA polymerase to initiate. Active protein ^ 
available only under conditions when the. ^ 
to be expressed; its absence means that the F 1 
moter is not activated by this particular ^ 

An example or a situation in which » v "^ 
genes are controlled by a single factor .' s ^,v 
vided by the heal shock response. This is 
mon to a wide range of prokaryotes ^ 
eukaryotes and involves multiple contro is 
gene expression: an increase in temper 0 ^ 
turns off transcription of some genes, lurnS ^„(l 
transcription of the heal shock 6 encS ' R{ ^ 
causes changes in the translation of ^ 
The control or the heat shock genes ilj u * aI ,d 
the differences between prokary<| uC 
eukaryotic modes of control. In. bacteria, ^ 
sigma factor is synthesized that 

dirCCtS s» fr 

polymerase holoenzyme to recognize an - 
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Abstract 

Background: Prostate stem cell antigen (PSCA) is a recently defined homologue of the Thy-I/Ly-6 family of 
glycosylphosphaudylinositol (GPI)-anchored cell surface antigens. The purpose of the present study was to 
examine the expression status of PSCA protein and mRNA in clinical specimens of human prostate cancer (Pea) 
and to validate it as a potential molecular target for diagnosis and treatment of Pea. 

Materials and Methods; Immunohistochemical (IHC) and in situ hybridization (ISH) analyses of PSCA 
expression were simultaneously performed on paraffin-embedded sections from 20 benign prostatic hyperplasia 
{BPH), 20 prostatic intraepithelial neoplasm (PIN) and 48 prostate cancer (Pea) tissues, including 9 androgen- 
independent prostate cancers. The level of PSCA expression was semiquantitative!/ scored by assessing both the 
percentage and intensity of PSCA-positive staining cells in the specimens. Then compared PSCA expression 
between BPH, PIN and Pea tissues and analysed the correlations of PSCA expression level with pathological grade, 
clinical stage and progression to androgen-independence in Pea. 

Results: In BPH and low grade PIN, PSCA protein and mRNA staining were weak or negative and less intense 
and uniform than that seen in HGPIN and Pea. There were moderate to strong PSCA protein and mRNA 
expression In 8 of II (72.7%) HGPIN and in 40 of 48 (83.4%) Pea specimens examined by IHC and ISH analyses, 
with statistical significance compared with BPH (20%) and low grade PIN (22.2%) samples (p < 0.0S, respectively). 
The expression level of PSCA increased with high Gleason grade, advanced stage and progression to androgen- 
independence (p < O.OS» respectively). In addition, IHC and ISH staining showed a high degree of correlation 
between PSCA protein and mRNA over expression, 

Conclusions: Our data demonstrate that PSCA as a new cell surface marker is overexpressed by a majority of 
human Pea. PSCA expression correlates positively with adverse tumor characteristics, such as increasing 
pathological grade (poor cell differentiation), worsening clinical stage and androgen-in dependence, and 
speculatively with prostate carcinogenesis. PSCA protein overexpression results from upregulated transcription 
of PSCA mRNA. PSCA may have prognostic utility and may be a promising molecular target for diagnosis and 
treatment of Pea. 
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Introduction 

Prostate cancer (Pea) is the second leading cause of can- 
cer-related death in American men and is becoming a 
common cancer increasing in China, Despite recently 
great progress in the diagnosis and management of local- 
ized disease, there continues to be a need for new diagnos- 
tic markers that can accurately discriminate between 
indolent and aggressive variants of Pea. There also contin- 
ues to be a need for the identification and characterization 
of potential new therapeutic targets on Pea cells. Current 
diagnostic and therapeutic modalities for recurrent and 
metastatic Pea have been limited by a lack of specific tar- 
get antigens of Pea. 

Although a number of prostate-specific genes have been 
identified (i.e. prostate specific antigen, prostatic acid 
phosphatase, glandular kallikrein 2), the majority of these 
are secreted proteins not ideally suited for many immuno- 
logical strategies. So, the identification of new cell surface 
antigens is critical to the development of new diagnostic 
and therapeutic approaches to the management of Pea. 

Reiter R£ et al ( 1 ] reported the identification of prostate 
stem cell antigen (PSCA), a cell surface antigen that is pre- 
dominantly prostate specific. The PSCA gene encodes a 
123 amino acid glycoprotein, with 30% homology to 
stem cell antigen 2 (Sea 2). Like Sca-2, PSCA also belongs 
to a member of theThy-l/Ly-6 family and is anchored by 
a glycosylphosphatidyl inositol (GPI) linkage. mRNA in 
situ hybridization (ISH) localized PSCA expression in nor- 
mal prostate to the basal cell epithelium, the putative 
stem cell compartment of prostatic epithelium, suggesting 
that PSCA may be a marker of prostate stem/progenitor 
cells. 

In order to examine the status of PSCA protein and mRNA 
expression in human Pea and validate it as a potential 
diagnostic and therapeutic target for Pea, we used immu- 
nohistochemistry (IHC) and in situ hybridization (ISH) 
simultaneously, and conducted PSCA protein and mRNA 
expression analyses in paraffin-embedded tissue speci- 
mens of benign prostatic hyperplasia (BPH, n = 20), pros- 
tate intraepithelial neoplasm (PIN, n = 20) and prostate 
cancer (Pea, n = 48). Furthermore, we evaluated the possi- 
ble correlation of PSCA expression level with Pea tumori- 
genesis, grade, stage and progression to androgen- 
independence. 

Materials and methods 
Tissue samples 

All of the clinical tissue specimens studied herein were 
obtained from 80 patients of 57-84 years old by prostate- 
ctomy, transurethral resection of prostate (TURP) or biop- 
sies. The patients were classified as 20 cases of BPH, 20 
cases of PIN, 40 cases of primary Pea, including 9 patients 



with recurrent Pea and a history of androgen ablation 
therapy (orchiectomy and/or hormonal therapy), who 
were referred to as androgen-independent prostate can- 
cers. Eight specimens were harvested from these andro- 
gen-independent Pea patients prior to androgen ablation 
treatment. Each tissue sample was cut into two parts, one 
was fixed in 10% formalin for IHC and the other treated 
with 4% paraformaldehyde/0.1 M PBS PH 7.4 in 0,1% 
DEPC for 1 h for ISH analysis, and then embedded in par- 
affin. All paraffin blocks examined were then cut into 5 
um sections and mounted on the glass slides specific for 
IHC and ISH respectively in the usual fashion. H&E- 
stained section of each Pea was evaluated and assigned a 
Cleason score by the experienced urological pathologist at 
our institution based on the criteria of Gleason score (2). 
The Gleason sums are summarized in Table 1. Clinical 
staging was performed according to Jewett-whitmore- 
prout staging system, as shown in Table 2. In the category 
of PIN, we graded the specimens into two groups, i.e. low 
grade PIN (grade I - II) and high grade PIN (HGP1N, 
grade III) on the basis of literatures |3,4). 

Immunohistochemicat (IHC) analysis 
Briefly, tissue sections were deparafrlnized, dehydrated, 
and subjected to microwaving in 10 mmol/L citrate 
buffer, PH 6.0 (Boshide, Wuhan, China) in a 900 W oven 
for 5 min to induce epitope retrieval. Slides were allowed 
to cool at room temperature for 30 min. A primary mouse 
antibody specific to human PSCA (Boshide, Wuhan, 
China) with a 1:100 dilution was applied to incubate with 
the slides at room temperature for 2 h. Labeling was 
detected by sequentially adding biotinylated secondary 
antibodies and strep avidin-peroxidase, and localized 
using 3,3'-diaminobenzidine reaction. Sections were then 
counterstained with hematoxylin. Substitution of the pri- 
mary antibody with phosphate-buffered-saline (PBS) 
served as a negative-staining control. 

mRNA in situ hybridization (ISH) 

Five-um-thick tissue sections were deparaffinized and 
dehydrated, then digested in pepsin solution (4 mg/ml in 
3% citric acid) for 20 min at 37.5°C, and further proc- 
essed for ISH. Digoxigenin-labeled sense and antisense 
human PSCA RNA probes (obtained from Boshide, 
Wuhan, China) were hybridized to the sections at 48 °C 
overnight. The posthybridization wash with a high strin- 
gency was performed sequentially at 37° C in 2 x standard 
saline citrate (SSC) for 10 min, in 0.5 x SSC for 15 min 
and in 0.2 * SSC for 30 min. The slides were then incu- 
bated to biotinylated mouse anti-digoxigenin antibody at 
37. 5 °C fori h followed by washing in 1 * PBS for 20 min 
at room temperature, and then to strepavidin-peroxidase 
at 37.5 °C for 20 min followed by washing in 1 x PBS for 
15 min at room temperature. Subsequently, the slides 
were developed with diaminobenzidine and then coun- 



Page2of7 

(page number not for citation purposes) 



VVQTV3 dUUftlal Of OUfylG&l KJilCOiUyy £\)\JH % A 






nttp:/7www.w)so.convcontent/2/i/i 3 


TaMe 1 : Correlation of PSCA expression with Gleason score 












Intensity * 


frequency 


G lea son score 


0-6 {%) 




»(*) 


5-7 
8-10 


S(83) 
19(79) 
5(28) 




I(I7) 
S(2I) 
13(72) 




Table 2: Correlation of PSCA expression with cllnicaJ stage 






Intensity * 


frequency 


Tumor stage 


0-6 (%) 




9(%) 


£8 


27 (67.5) 
2(25) 




13(32.5) 
6(75) 





terstained with hematoxylin to localize the hybridization 
signals. Sections hybridized with the sense control probes 
routinely did not show any specific hybridization signal 
above background. All slides were hybridized with PBS to 
substitute for the probes as a negative control. 

Scoring methods 

To determine the correlation between the results of PSCA 
immunostaining and mRNA in situ hybridization, the 
same scoring manners are taken in the present study for 
PSCA protein staining by IHC and PSCA mRNA staining 
by ISH. Each slide was read and scored by two independ- 
endy experienced urological pathologists using Olympus 
BX-41 light microscopes. The evaluation was done in a 
blinded fashion. For each section, five areas of similar 
grade were analyzed semiquantitatively for the fraction of 
cells staining. Fifty percent of specimens were randomly 
chosen and rescored to determine the degree of interob- 
server and intraobserver concordance. There was greater 
than 95% intra- and interobserver agreement. 

The intensity of PSCA expression evaluated microscopi- 
cally was graded on a scale of 0 to 3+ with 3 being the 
highest expression observed (0, no staining; 1+, mildly 
intense; 2+, moderately intense; 3+, severely intense). The 
staining density was quantified as the percentage of cells 
staining positive for PSCA with the primary antibody or 
hybridization probe, as follows: 0 = no staining; 1 = posi- 
tive staining in <25% of the sample; 2 = positive staining 
in 25%-50% of the sample; 3 = positive staining in >50% 



of the sample. Intensity score (0 to 3+) was multiplied by 
the density score (0-3) to give an overall score of 0-9 
[1,5]. In this way, we were able to differentiate specimens 
that may have had focal areas of increased staining from 
those that had diffuse areas of increased staining [6]. The 
overall score for each specimen was then categorically 
assigned to one of the following groups: 0 score, negative 
expression; 1-2 scores, weak expression; 3-6 scores, mod- 
erate expression; 9 score, strong expression. 

Statistical analysis 

Intensity and density of PSCA protein and mRNA expres- 
sion in BPH, PIN and Pea tissues were compared using the 
Chi-square and Student's t-test. Univariate associations 
between PSCA expression and Gleason score, clinical 
stage and progression to androgen- independence were 
calculated using Fisher's Exact Test. For all analyses, p < 
0.05 was considered statistically significant. 

Results 

PSCA express/on in BPH 

In general, PSCA protein and mRNA were expressed 
weakly in individual samples of BPH. Some areas of 
prostate expressed weak levels (composite score 1-2), 
whereas other areas were completely negative (composite 
score 0). Four cases (20%) of BPH had moderate expres- 
sion of PSCA protein and mRNA (composite score 4-6) 
by IHC and ISH. In 2/20 (10%) BPH specimens, PSCA 
mRNA expression was moderate (composite score 3-6), 
but PSCA protein expression was weak (composite score 
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2) in one and negative (composite score 0) in the other. 
PSCA expression was localized to the basal and secretory 
epithelial cells, and prostatic stroma was almost negative 
staining for PSCA protein and mRNA in all cases 
examined. 

PSCA expression in PIN 

In this study, we detected weak or negative expression of 
PSCA protein and mRNA (£2 scores) in 7 of 9 (77.8%) 
low grade PIN and in 2 of 11 (18.2%) HGPIN, and mod- 
erate expression (3-6 scores) in the rest 2 low grade PIN 
and 5 of 11 (45.5%) HGPIN. One HGPIN with moderate 
PSCA mRNA expression (6 score) was found weak stain- 
ing for PSCA protein (2 score) by IHC. Strong PSCA pro- 
tein and mRNA expression (9 score) were detected in the 
remaining 3 of 11 (27.3%) HGPIN. There was a statisti- 
cally significant difference of PSCA protein and mRNA 
expression levels observed between HGPIN and BPH (p < 
0.05), but no statistical difference reached between low 
grade PIN and BPH (p > 0.05). 

PSCA expression in Pea 

In order to determine if PSCA protein and mRNA can be 
detected in prostate cancers and if PSCA expression levels 
are increased in malignant compared with benign glands, 
Forty-eight paraffin-embedded Pea specimens were ana- 
lysed by IHC and ISH. It was shown that 19 of 48 (39.6%) 
Pea samples stained very strongly for PSCA protein and 
mRNA with a score of 9 and another 21 (43.8%) speci- 
mens displayed moderate staining with scores of 4-6 (Fig- 
ure 1). In addition, 4 specimens with moderate to strong 
PSCA mRNA expression (scores of 4-9) had weak protein 
staining (a score of 2) by IHC analyses. Overall, Pea 
expressed a significantly higher level of PSCA protein and 
mRNA than any other specimen category in this study (p 
< 0.05, compared with BPH and PIN respectively). The 
result demonstrates that PSCA protein and mRNA are 
overexpressed by a majority of human Pea. 

Correlation of PSCA expression with Gleason score in Pea 
Using the semi-quantitative scoring method as described 
in Materials and Methods, we compared the expression 
level of PSCA protein and mRNA with Gleason grade of 
Pea, as shown in Table 1. Prostate adenocarcinomas were 
graded by Gleason score as 2-4 scores » well-differentia- 
tion, 5-7 scores = moderate-differentiation and 8-10 
scores » poor-differentiation [7]. Seventy-two percent of 
Gleason scores 8-10 prostate cancers had very strong 
staining of PSCA compared to 21% with Gleason scores 
5-7 and 17% with 2-4 respectively, demonstrating that 
poorly differentiated Pea had significantly stronger 
expression of PSCA protein and mRNA than moderately 
and well differentiated tumors (p < 0.05). As depicted in 
Figure 1, IHC and ISH analyses showed that PSCA protein 
and mRNA expression in several cases of poorly differen- 



tiated Pea were particularly prominent, with more intense 
and uniform staining. Hie results indicate that PSCA 
expression increases significantly with higher tumor grade 
in human Pea. 

Correlation of PSCA expression with clinical stage in Pea 
With regards to PSCA expression in every stage of Pea, we 
showed the results in Table 2. Seventy-five percent of 
locally advanced and node positive cancers (i.e. C-D 
stages) expressed statistically high levels of PSCA versus 
32.5% that were organ confined (i.e. A-B stages) (p < 
0.05). The data demonstrate that PSCA expression 
increases significantly with advanced tumor stage in 
human Pea. 

Correlation of PSCA expression with androgen- 
independent progression of Pea 

All 9 specimens of androgen-independent prostate can- 
cers stained positive for PSCA protein and mRNA. Eight 
specimens were obtained from patients managed prior to 
androgen ablation therapy. Seven of eight (87.5%) of 
these androgen-independent prostate cancers were in the 
strongest staining category (score = 9), compared with 
three out of eight (37.5%) of patients with androgen- 
dependent cancers (p < 0.05). The results demonstrate 
that PSCA expression increases significantly with progres- 
sion to androgen-independence of human Pea. 

It is evident from the results above that within a majority 
of human prostate cancers the level of PSCA protein and 
mRNA expression correlates significantly with increasing 
grade, worsening stage and progression to androgen-inde- 
pendence. 

Correlation of PSCA immunostaining and mRNA in situ 
hybridization 

In all 88 specimens surveyed herein, we compared the 
results of PSCA IHC staining with mRNA ISH analysis. 
Positive staining areas and its intensity and density scores 
evaluated by IHC were identical to those seen by ISH in 79 
of 88 (89.8%) specimens (18/20 BPH, 19/20 PIN and 42/ 
48 Pea respectively). Importandy, 27/27 samples with 
PSCA mRNA composite scores of 0-2, 32/36 samples 
with scores of 3-6 and 22/24 samples with a score of 9 
also had PSCA protein expression scores of 0-2, 3-6 and 
9 respectively. However, in 5 samples with PSCA mRNA 
overall scores of 3-6 and in 2 with scores of 9 there were 
less or negative PSCA protein expression (i.e. scores of 0- 
4), suggesting that this may reflect posttranscriptional 
modification of PSCA or that the epitopes recognized by 
PSCA mAb may be obscured in some cancers. The data 
demonstrate that the results of PSCA immunostaining 
were consistent with those of mRNA ISH analysis, show- 
ing a high degree of correlation between PSCA protein 
and mRNA expression. 
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Figure I 

Representatives of PSCA IHC and ISH staining in Pea (A. IHC staining, B. ISH staining, x200 magnification). A,, B { : negative con- 
trol of IHC and ISH. PBS replacing the primary antibody (A,) and hybridization with a sense PSCA probe (B,) showed no back- 
ground staining. A 2 . Bj: a moderately differentiated Pea (Gleason score = 3+3 = 6) with moderate staining (composite score = 
6) in all malignant cells; A 2 ; IHC shows not only cell surface but also apparent cytoplasmic staining of PSCA protein. A 3 » B 3 : a 
poorly differentiated Pea (Gleason score = 4+4 = 8) with yery strong staining (composite score = 9) in all malignant cells. 
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Discussion 

PSCA is homologous to a group of cell surface proteins 
that mark the earliest phase of hematopoietic develop- 
ment. PSCA mRNA expression is prostate-specific in nor- 
mal male tissues and is highly up-regulated in both 
androgen -de pendent and-independent Pea xenografts 
(LAPC-4 tumors). We hypothesize that PSCA may play a 
role in Pea tumorigenesis and progression, and may serve 
as a target for Pea diagnosis and treatment. In this study, 
IHC and ISH showed that in general there were weak or 
absent PSCA protein and mRNA expression in BPH and 
low grade PIN tissues. However, PSCA protein and mRNA 
arc widely expressed in HGPIN, the putative precursor of 
invasive Pea, suggesting that up-regulation of PSCA is an 
early event in prostate carcinogenesis. Recently, Reiter RE 
et al [1 ], using ISH analysis, reported that 97 of 1 18 (82%) 
HGPIN specimens stained strongly positive for PSCA 
mRNA. A very similar finding was seen on mouse PSCA 
(mPSCA) expression in mouse HGPIN tissues byTran C. 
P et al [8]. These data suggest that PSCA may be a new 
marker associated with transformation of prostate cells 
and tumorigenesis. 

We observed that PSCA protein and mRNA are highly 
expressed in a large percentage of human prostate cancers, 
including advanced, poorly differentiated, androgen- 
independent and metastatic cases. Fluorescence-activated 
cell sorting and confocal/ immuno fluorescent studies 
demonstrated cell surface expression of PSCA protein in 
Pea cells [9]. Our IHC expression analysis of PSCA shows 
not only cell surface but also apparent cytoplasmic stain- 
ing of PSCA protein in Pea specimens (Figure 1). One pos- 
sible explanation for this is that anu-PSCA antibody can 
recognize PSCA peptide precursors that reside in the cyto- 
plasm. Also, it is possible that the positive staining that 
appears in the cytoplasm is actually from the overlying 
cell membrane [5]. These data seem to indicate that PSCA 
is a novel cell surface marker for human Pea. 

Our results show that elevated level of PSCA expression 
correlates with high grade (i.e. poor differentiation), 
increased tumor stage and progression to androgen-inde- 
pendence of Pea. These findings support the original IHC 
analyses by Gu Z et al (9], who reported that PSCA protein 
expressed in 94% of primary Pea and the intensity of 
PSCA protein expression increased with tumor grade, 
stage and progression to androgen-independence. Our 
results also collaborate the recent work of Han KR et al 
[10], in which the significant association between high 
PSCA expression and adverse prognostic features such as 
high Gleason score, seminal vesicle invasion and capsular 
involvement in Pea was found. It is suggested that PSCA 
overexpression may be an adverse predictor for recur- 
rence, clinical progression or survival of Pea. Hara H et al 
|11] used RT-PCR detection of PSA, PSMA and PSCA in 1 



ml of peripheral blood to evaluate Pea patients with poor 
prognosis. The results showed that among 58 PCa 
patients, each PCR indicated the prognostic value in the 
hierarchy of PSCA>PSA>PSMA RT-PCR, and extraprostatic 
cases with positive PSCA PCR indicated lower disease-pro- 
gression-free survival than those with negative PSCA PCR, 
demonstrating that PSCA can be used as a prognostic fac- 
tor. Dubey P et al 1 12] reported that elevated numbers of 
PSCA + cells correlate positively with the onset and devel- 
opment of prostate carcinoma over a long time span in 
the prostates of the TRAMP and PTEN +/- models com- 
pared with its normal prostates. Taken together with our 
present findings, in which PSCA is overexpressed from 
HGPIN to almost frank carcinoma, it is reasonable and 
possible to use increased PSCA expression level or 
increased numbers of PSCA-positive cells in the prostate 
samples as a prognostic marker to predict the potential 
onset of this cancer. These data raise the possibility that 
PSCA may have diagnostic utility or clinical prognostic 
value in human Pea. 

The cause of PSCA overexpression in Pea is not known. 
One possible mechanism is that it may result from PSCA 
gene amplification. In humans, PSCA is located on chro- 
mosome 8q24.2 (1), which is often amplified in meta- 
static and recurrent Pea and considered to indicate a poor 
prognosis (13-15]. Interestingly, PSCA is in close proxim- 
ity to the c-myc oncogene, which is amplified in >20% of 
recurrent and metastatic prostate cancers [16,17]. Reiter 
RE et al 1 18] reported that PSCA and MYC gene copy num- 
bers were co-amplified in 25% of tumors (five out of 
twenty), demonstrating that PSCA overexpression is asso- 
ciated with PSCA and MYC coampliflcation in Pea. Gu Z 
et al |9] recently reporteted that in 102 specimens availa- 
ble to compare the results of PSCA immunosiaining with 
their previous mRNA ISH analysis, 92 (90.2%) had iden- 
tically positive areas of PSCA protein and mRNA expres- 
sion. Taken together with our findings, in which we 
detected moderate to strong expression of PSCA protein 
and mRNA in 34 of 40 (85%) Pea specimens examined 
simultaneously by IHC and ISH analyses, it is demon- 
strated that PSCA protein and mRNA overexpressed in 
human Pea, and that the increased protein level of PSCA 
was resulted from the upregulated transcription of its 
mRNA. 

At present, the regulation mechanisms of human PSCA 
expression and its biological function are yet to be eluci- 
dated. PSCA expression may be regulated by multiple fac- 
tors {18]. WatabeTet al [19] reported that transcriptional 
control is a major component regulating PSCA expression 
levels. In addition, induction of PSCA expression may be 
regulated or mediated through cell-cell contact and pro- 
tein kinase C (PKC) [20]. Homologues of PSCA have 
diverse activities, and have themselves been involved in 
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carcinogenesis. Signalling through SCA-2 has been dem- 
onstrated to prevent apoptosis in immature thymocytes 
[21]. Thy-1 is involved in T cell activation and transducts 
signals through src-like tyrosine kinases [22]. Ly-6 genes 
have been implicated both in tumorigenesis and in cell- 
cell adhesion [23-25], Cell-cell or cell-matrix interaction is 
critical for local tumor growth and spread to distal sites. 
From its restricted expression in basal cells of normal 
prostate and its homology to SCA-2, PSCA may play a role 
in stem/progenitor cell function, such as self-renewal (i.e. 
anti-apoptosis) and/or proliferation [1]. Taken together 
with the results in the present study, we speculate that 
PSCA may play a role in tumorigenesis and clinical pro- 
gression of Pea through afFecting cell transformation and 
proliferation. From our results, it is also suggested that 
PSCA as a new cell surface antigen may have a number of 
potential uses in the diagnosis, therapy and clinical prog- 
nosis of human Pea. PSCA overexpression in prostate 
biopsies could be used to identify patients at high risk to 
develop recurrent or metastatic disease, and to discrimi- 
nate cancers from normal glands in prostatectomy sam- 
ples. Similarly, the detection of PSCA-overexpressing cells 
in bone marrow or peripheral blood may identify and pre- 
dict metastatic progression better than current assays, 
which identify only PSA-positive or PSMA-positive pros- 
tate cells. 

In summary, we have shown in this study that PSCA pro- 
tein and mRNA are maintained in expression from 
HGPIN through all stages of Pea in a majority of cases, 
which may be associated with prostate carcinogenesis and 
con-elate positively with high tumor grade (poor cell dif- 
ferentiation), advanced stage and androgen-independent 
progression. PSCA protein overexpression is due to the 
up regulation of its mRNA transcription. The results sug- 
gest that PSCA may be a promising molecular marker for 
the clinical prognosis of human Pea and a valuable target 
for diagnosis and therapy of this tumor. 
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Abstract 

Background: Colorectal cancer is a common cancer all over the world. Aberrations in the cell 
cycle checkpoints have been shown to be of prognostic significance in colorectal cancer. 

Methods: The expression of cyclin D I, cyclin A, histone H3 and K/-67 was examined in 60 colorectal 
cancer cases for co-regulation and impact on overall survival using immunohistochemistry, 
southern blot and in situ hybridization techniques. Immunoreactivity was evaluated semi 
quantitatively by determining the staining index of the studied proteins. 

Results: There was a significant correlation between cyclin 01 gene amplification and protein 
overexpression (concordance = 63.6%) and between Ki-67 and the other studied proteins. The 
staining index for Ki-67, cyclin A and 0/ was higher in large, poorly differentiated tumors. The 
staining index of cyclin 01 was significantly higher in cases with deeply invasive tumors and nodal 
metastasis. Overexpression of cyclin A and 0/ and amplification of cyclin 01 were associated with 
reduced overall survival. Multivariate analysis shows that cyclin 01 and A are two independent 
prognostic factors in colorectal cancer patients. 

Conclusions: Loss of cell cycle checkpoints control is common in colorectal cancer. Cyclin A and 
01 are superior independent indicators of poor prognosis in colorectal cancer patients. Therefore, 
they may help in predicting the clinical outcome of those patients on an individual basis and could 
be considered important therapeutic targets. 



Background 

Colorectal cancer (CRC) is the third most common cancer 
in Western countries |1 1. In Egypt, CRC has unique char- 



acteristics that differ from that reported in other countries 
of the western society. It was estimated that 35.6% of the 
Egyptian CRC cases are below 40 years of age and patients 
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usually present with advanced stage, high grade tumors 
that carry more mutations (2). This uniquely high propor- 
tion of early-onset CRC, the early and continuous expo- 
sure to hazardous environmental agents, the different 
mutational spectrum and the prevalent consanguinity in 
Egypt justify further studies [3|. It was proved that most 
cancers result from accumulation of genetic alterations 
involving certain groups of genes, the majority of which 
are cell cycle regulators that either stimulate or inhibit cell 
cycle progression [1]. Cell proliferation allows orderly 
progression through the cell cycle, which is governed by a 
number of proteins including cyclins and cyclin dependent 
kinases (4,5]. The cyclins belong to a superfamily of genes 
whose products complex with various cyc/m-dependent 
kinases {cdks) to regulate transitions through key check- 
points of the cell cycle [6 J. Abnormalities of several cyclins 
have been reported in different tumor types, implicating, 
in particular, cyclin A, cyclin E and cyclin D (6,7|. 

Cyclin Dl is a Gl cyclin that regulates the transition from 
CI to S phase since its peak level and maximum activity 
are reached during the Gl phase of the cell cycle. Whereas 



cyclin A is regarded a regulator of the transition to mitosis 
since it reaches its maximum level during the S and G2 
phases [8{. The mechanisms likely to activate the onco- 
genic properties of the cyclins include chromosomal trans- 
locations, gene amplification and aberrant protein 
overexpression |7,9|. 

Several studies have shown that, histone H3 mRNA expres- 
sion can be used to identify the S phase fraction (SPF) 
through the in situ hybridization (ISH) technique (10,1 1 ]. 
The level of histone H3 mRNA reaches its peak during the 
S phase and then drops rapidly at the G2 phase 1 1 2|. 

In face of the increasing incidence of CRC and its peculiar 
pattern in the Egyptian population, the present study was 
conducted to assess the role of Ki-67 (pan-cell cycle 
marker), cyclin Dl (Gl phase marker), histone H3 mRNA 
(S phase marker), cyclin A (S to G2 phase marker) in CRC. 
The expression level of these markers was correlated to the 
clinicopathologic features and the overall survival of 
patients. 



Table I: CHnicopathologlcal features of patients In relation to the staining Index (St) of Ki-67, cydin Dl t cyclin A, histone H3 

$1 (mean + SD) 



Variables 


No. of cosej 


; Ki-67 


Cyclin Dl 


Cydin A 


Histone H3 


Sex 












Male 


36 


18.0 ±6.4 


6.7 ± 4.3 


12.7 ±5.7 


(0.7 ±5.3 


Female 


24 


20.1 ±5.8 


8.8 ± 8.4 


10.0 ±6.0 


10.7 ±5.4 


Age (years) 












250 


41 


11.7 ±6.0* 


5.6 ± 5.2 


10.0 ± 5.3 


6.0 ± 5.0* 


<50 


19 


23.8 ± 5.6 


7.7 ± 6.8 


13.6 ±5.7 


22.0 ± 5.2 


Tumor size (cm) 












<5.0 


33 


12.2 ±6.3* 


5.3 ± 3.8* 


11.5 ±6.1* 


10.3 ±4.9* 


25.0 


27 


30.1 ±6.2 


22.8 ± 7.2 


28.6 ± 5.6 


24.0 ± 5.6 


Histology 












Normal 


20 


3.5 ± 2.0* 


0.6 ± 0.2* 


2.3 ± U* 


2.2 ± 0.9 


Carcinoma 


60 


30.3 ± 6.2 


24.9 ± 6.3 


27.2 ± 5.8 


10.7 ±5.3 


Gl 


15 


11.7 ±6.2 


6.6 ± 4.0 


10.0 ±5.4 


11.4 ±4.9 


Git 


21 


11.8 ±5.6 


8.9 ± 3.6 


12.3 ±6.5 


7.8 ± 5.4 


Gill 


24 


30.0 ± 4.3 


22.0 ±8.1 


27.0 ± 4.9 


11.5 ±5.4 


Lymph node 












Negative 


33 


19.5 ± 7.0 


5.4 ± 5.3* 


11.9 ±6.5 


IZ3±5.5 


Positive 


27 


2L3 ±4.9 


20.6 ± 6.9 


12.5 ±5.0 


14.2 ±5.0 


Depth of invasion 












m, sm 


17 


20.7 ± 6.7 


3.1 ± 3.1* 


11.9 ±7.2 


10.4 ±5.1 


beyond sm 


43 


21.9 ±6.2 


12.4 ±6.5 


12.2 ±5.6 


10.7 ±5.4 


Stage 












i 


6 


20.6 ± 6.7 


5.7 ± 6.9 


24.2 ± 6.9 


ll.l ±5.3 




27 


20.8 ± 6.9 


5.3 ± 4.3 


24.6 ± 6.0 


10.4 ± 5.7 


in 


12 


22.0 ± 5.4 


7.7 ± 6.0 


27.1 ±5.2 


10.4 ± 4.9 


IV 


15 


247 ±6.1 


1 1.3 ±9.6 


27.5 ± 5.5 


12.3 ± 6.2 



* p. value < 0,05 (significant) 
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Methods 

Tissue samples 

Paraffin-embedded tumor tissues were obtained from 60 
CRC patients (47 colon and 13 rectal carcinomas) that 
were diagnosed and treated at the National Cancer Insti- 
tute, Cairo, Egypt during the period from January, 1997 to 
June, 2002. Clinicopathological data of the studied cases 
are illustrated in table 1. None of the patients received any 
chemotherapy or irradiation prior to surgery. Histological 
diagnosis of all cases was done by 2 independent pathol- 
ogists according to the WHO Histological Classification. 
Tumors were staged according to the TNM staging system 
[13]. The depth of tumor invasion was classified as inva- 
sion of the mucosa including muscularis mucosa (m), 
invasion of the submucosa (sm), or invasion beyond the 
submucosa [8]. Normal colonic tissues were obtained 
from autopsy specimens (n = 20) and were used as a con- 
trol. The actual survival rate of the patients was calculated 
from the date of resection to the date of death. 

Immunohistoch emistry 

Four micron sections of each normal and tumor specimen 
were cut onto positive-charged slides; air dried overnight, 
de-paraffinized in xylene, hydrated through a series of 
graded alcohol and washed in distilled water and 0.01 
PBS (pH 7.4). Slides were then processed for IHC as 
described by Handa et al. [8]. using the following anti- 
bodies: Ki-67 (MIB-1, Dako), cyclin A (6E6; Novocastra, 
Newcastle-Upon-Tyne, UK) and cyclin Dl (DCS-6, Dako). 
A case of invasive breast cancer was used as a positive con- 
trol for Ki-67 and cyclin A whereas a case of mantle cell 
lymphoma was used as a control for cyclin Dl. Negative 
controls were obtained by replacing the primary antibody 
by non-immunized rabbit or mouse serum. 

Brown nuclear staining was regarded as a positive result 
for all studied markers. The proportion of positively- 
stained cells and the intensity of staining were scored in 
tumor and normal colorectal mucosal sections at medium 
power (*200). The degree of positive tumor staining (per- 
centage of positive tumor cells in the examined section) 
was scored from 1-6 and the staining intensity was scored 
from 0-6 according to the pattern of staining in the exam- 
ined section. Staining index (SI) was calculated by multi- 
plying the cellularity and staining scores as described by 
Kinget al. [14]. 

In situ hybridization 

All tumor samples and 5 normal controls were assessed 
for histone H3 mRNA by ISH using the commercially avail- 
able 5S0 base fluorescein-labeled DNA probe (Dako, 
Carpinteria, CA) as described by Nagao et al., 1996. This 
probe hybridizes to the whole mRNA transcript of the 
human histoneH3 gene including the5' and 3' untrans- 
lated regions. Scoring of histone H3 mRNA was performed 



as for immunohistochemistry, however, hybridization 
signals were detected in the cytoplasm. 

Molecular detection of cyclin D t gene amplification 

High molecular weight DNA was extracted from paraffin- 
embedded tissues of the tumor and normal colorectal 
mucosal samples as previously described [15). The pro- 
portion of neoplastic and normal cells was determined in 
each tumor sample by examining hematoxylin and eosin- 
stained slides obtained from the edge of the specimen 
used for DNA extraction. Tumor samples were evaluated 
for amplification of cyclin Dl if more than 75% of the 
examined sections were formed of neoplastic cells. 
Accordingly, 50 cases were eligible for the analysis. Ten 
micrograms of the extracted DNA was digested with 
EcoRl. DNA from selected cases was also digested with 
Bglll and Hind III. Samples were separated on 0.8% agar- 
ose gels and transferred to Hybond-N membranes (Amer- 
sham Int., Amersham, UK). The membranes were 
hybridized with 50% formamide, 5 x SSC, 5 x Denhardt's, 
500 ug/ml denatured salmon sperm DNA, 10% dextran 
sulphate and 10 6 cpm/rnl of 32 P-Iabeled PRAD-l probe for 
24 h. Membranes were washed with 2 * SSC, 0.1% SDS at 
room temperature for 30 min followed by 2 x SSC, 0.1% 
SDS at 60°C for 30 min and 0.1 x SSC, 0. 1% SDS at 60°C 
for 1 h. Filters were autoradiographed using an intensify- 
ing screen at -70 °C for 24-72 h. After being stripped free 
of the PRAD-l probe, the same blots were hybridized with 
32 P-labeled B-aciin probe to normalize against possible 
variations in the loading or transfer of DNA. The autora- 
diograms were analyzed using a densitometer. Intensities 
of PRAD-l /cyclin Dl were normalized to the fl-aciin con- 
trol bands. The degree of amplification was calculated 
from these normalized values. Amplification was consid- 
ered when the signal of the tumor band was >2-fold the 
value of the matched normal mucosa [16]. 

Statistical analysis 

The Mann-Whitney non-parametric test was used to com- 
pare the Sis of pairs of subjects whereas the Kruskal-wallis 
was used for categorial data. Correlation between indices 
was performed using a simple linear regression test. The 
Kaplan-Meier method was used to create survival curves 
which were analyzed by the log-rank test. The impact of 
different variables on survival was determined using the 
Cox proportional hazards model, p. values less than 0.05 
were considered significant. 

Results 

The results of IHC are illustrated in figures 1 and 2. In gen- 
eral, the staining index (Sis) of all studied markers was 
higher in carcinomas than in normal colonic mucosal 
samples (p = 0.0001). Normal colorectal mucosa revealed 
positive imunostaining for Ki-67 in the lower half of the 
crypts only. A heterogeneous staining pattern was 
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Figure I 

Normal colonic mucosa showing positive nuclear immunostaining for: (a) c/clin Dl, (b) ISH of histone H3 mRNA, (c) KJ-67 and 
(d) cyc//n A 



detected in the neoplastic cells of well and moderately-dif- 
ferentiated adenocarcinomas whereas a diffuse homoge- 
neous staining pattern was detected in poorly- 
differentiated carcinomas. The SI ranged from 10-40.2 
(mean: 24.6 ± 6.5). 

Immunostaining for cyclin Dl was predominantly nuclear 
but cytoplasmic staining was detected in some cases. 
However, unless a nuclear staining was also detected, 
cases with cytoplasmic staining were considered negative. 
Normal colorectal mucosal samples were almost negative 
for cyclin Dl whereas 41 out of the 60 (68.3%) CRC cases 
were positive. Marked heterogeneity was observed in well- 
and moderately-differentiated adenocarcinomas even 
within the same tumor. Poorly-differentiated carcinomas 
revealed a diffuse staining pattern with more darkly- 
stained nuclei. The SI ranged from 0.5-28.6 (mean: 9.3 ± 
4.2). 



Positive nuclear staining for cyclin A was detected in 80% 
(48/60) of CRC cases and in all non-neoplastic control 
samples. Positively-stained nuclei were confined to the 
lower half of the crypts in normal colonic mucosa and dif- 
fusely-dispersed in carcinomas. The SI ranged from 3.3- 
30.2 (mean: 15.1 ± 6.6). 

Histone H3 mRNA was intensely expressed in the cyto- 
plasm of all examined samples either neoplastic or non- 
neoplastic. The distribution of histone H3 mRNA was 
simitar to that of cyclin A and Ki-67 however, the propor- 
tion of histone H3 mRNA positive cells was less than that 
of Ki-67. The SI ranged from 1.8-24.2 (mean: 12.4 ± 5.3). 

The PRAD-1 probe detected 3 EcoRI fragments of 4.0, 2.2 
and 2.0 and 1 BglU fragment of 15 Kb. PRAD-J /cyclin Dl 
gene amplification was detected in 22/50 (44%) cases 
analyzed. The degree of amplification was heterogeneous 
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Figure 2 

A case of well differentiated adenocarcinoma with positive immunostaining for (a) cydin 0/, (b) histone H3 mRNA, (c) Kf-67, 
and (d) cydin A. Another case of moderately differentiated denocarcinoma with positive immunostaining for; (e) cyclin D I , (f) 
histone H3 mRNA, (g) Kf-67, and (h) cyctin A. A case of poorly differentiated adenocarcinoma with diffuse staining for: (i) cydin 
D /. (i) ISH of histone H3 mRNA. (k) KJ-67 and (I) cydin A. 



with 2-10 fold increase when compared to normal 
mucosa! samples (Figure 3). Amplification was confirmed 
by other restriction enzymes. 

Correlations 

There was a significant correlation between cydin D1 gene 
amplification and protein overexpression. Out of the 22 



cases that showed amplification 14 showed protein over- 
expression (concordance = 63.6%). 

Linear regression analysis of Sis revealed a significant cor- 
relation between Ki-67 and cyclin Dl, cyclin A t histone H3 
as well as between the Sis of cyclin A and histone H3 (p = 
0.008, 0.0001, and 0.0001 respectively) (Figure 4). There 
was a significant relationship between the SI of both Ki-67 
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Figure 3 

A: Southern blot analysis of normal mucosa (N) and their seven corresponding cases of colonic adenocarcinomas (TI-T7), 
cases No. I, 2, 4, and 5 are poorly differentiated whereas cases No. 3, 6, and 7 are moderately differentiated. Genomic DNA 
was digested with Bg/ll, fractionated by electrophoresis in agarose gel. transferred onto membranes and hybridized with PRADi 
and P-octin. Tumors number I -6 (Lanes I -6) show different degrees of PRADUcyd'm 01 amplification, tumor number 7 (lane 7) 
was not amplified. B: Southern blot analysis of 3 cases of adenocarcinomas (T) and matched normal colonic mucosa (N), 
Genomic DNA was digested with EcoRI, fractionated by electrophoresis in agarose gel, transferred onto membranes and 
hybridized with PRADl and fi-actin probes for loading control. The identification of the 3 tumors is the same as in Fig. 3A with 
amplification of PRADUcyd'm Dl in tumors number 4, S (Lanes I, 2) but not 7 (Lane 3). 



and cyclin A and the degree of differentiation of tumors as 
well as the size of the tumor (p < 0.001 and p < 0,01 
respectively). In addition, SI of Ki-67 and hisione H3 were 
higher in patients <50 years than in those £50 years (p < 
0.05) (table 1). 



In addition table 2 shows a significant relationship 
between high cyclin Dl Si and large, poorly-differentiated 
tumors, carcinomas with positive lymph node metastasis 
and deeply-invasive carcinomas (p < 0.05, p < 0.001, p < 
0.05 and p < 0.05 respectively). Whereas cyclin DJ gene 
amplification was significantly associated with an 
advanced disease stage since amplification was detected in 



Page 6 of 12 
(page number not for citation purposes) 



BMC Gastroenterology 2004, 4:22 



http:/Awww.biomedcentraLcom/1471-230X/4/22 





Figure 4 

Correlation between the staining intensity of (a) Ki-67 vs. cydin D/, (b) Ki-67 vs. histone H3> (c) Ki-67 vs. cydin A and (d) cyd/n A 
vs. h/stone H3 mRNA expression. 



10/15 (66.7%) of stage IV tumors compared to 12/45 
(26.7%) of stage Mil tumors (p = 0.002). Similarly, DNA 
amplification was detected in 60.5% (26/43) of the carci- 
nomas with extensive local invasion (beyond sm) but 
only in 23.5% (4/17) of the carcinomas with limited inva- 
sion (m, sm) (p = 0.001). A significant correlation was 
also present between cydin Dl gene amplification and the 
presence of lymph node metastasis (p = 0.008) as well as 
between the SI of histone H3, the size of the tumor and the 
patient's age (p < 0.05, p < 0.001 respectively). The SI was 
higher in tumors >5 cm in diameter and in patients <50 
years. 

Survival analysis 

The mean follow-up period for all patients was 30 months 
(range: 1-66 months). Eighteen of 60 patients had 
already died by the time the study was completed. We 



defined the cutoff level for overexpression of each cell 
cycle marker at the point that showed the maximum dif- 
ference of survival rate between the 2 groups separated by 
that point. Cox regression analysis revealed that cydin A 
overexpression (our definition: SI £ 10.5), cydin Dl over- 
expression (our definition: SI £ 6,1), poorly differentiated 
histology, lymph node metastasis, TNM stage, tumor size 
and depth of invasion were alt significant prognostic var- 
iables for survival (Table 3). The Kaplan-Meier survival 
curves for the subgroups of patients who are subdivided 
according to each marker's status are shown in Figure 5. 
Patient with tumors that showed Ki-67 overexpression 
(our definition: SI 2 1 1.5) and histone H3 overexpression 
(our definition: SI £ 8.2) tended to have poor prognosis 
but this did not reach a statistically significant level, how- 
ever the overall survival was significantly lower in patient 
with cyclin A and cydin Dl overexpression. Cox multivari- 
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Table 2; The relation between cyctin Dl overexpresslon vs cydin Dl amplification and cttnteopatho logical prognostic markers. 
Variables No. of cases Cyctin Dl overexpression Cyctin Dl Amplification 



Tumor size (cm) 



<$.o 


33 


5.3 ± 3.8* 


13/33 


25.0 


27 


2Z8 ± 7.2 p <0.05 


9/27 p <Q.236 


Histology 








Gl 


15 


6.6 ± 4.0 


7/15 


Gil 


21 


8.9 ± 3.6 


8/21 


Gill 


24 


2Z0±8.I p<0.00l 


7/24 p <0.O75 


Lymph node 








Negative 


33 


5.4 ± 5.3* 


6/33(18.2%) 


Positive 


27 


20.6 ± 6.9 p <0.05 


16/27 (59.3%) p<0.008 


Depth of invasion 








m, sm 


17 


3.1 ±3.1* 


4/17(23.5%) 


beyond sm 


43 


12,4 ± 6.5 p<0-05 


26/43 (6O.5%)p<0.00/ 


Stage 








early 


45 


5.5 ± iai 


12/45 (26.7%) 


late 


15 


li.3±9.6P = 0./75 


10/15 (66.7%) p<0.0O2 



Table 3: U univariate analysis of the relationship between survival and the tested markers 

PredictiveVoriables Median Survival HR Ct P 



K/-67 



<ll.5 


36 










32 


1.826 


0.636 - 5.243 


0.26 


Cydin Di 










<6.l 


35 








26.1 


18 


7.246 


1.007-45.150 


0.03* 


Hlsione H3 










<8.2 


35 








£8.2 


29 


4.639 


0.854-25.196 


0.07 


Cyctin A 










<I0.5 


35 








210.5 


15 


7.820 


1.017 -60.122 


0.02* 


Histological grade 










Low 


38 








High 


10 


7.331 


2.696-19.940 


0.0001* 


Lymph node 










Negative 


38 








Positive 


15 


6.826 


1.973-23.621 


0.002* 


Stage 










l t 11, Ill 


38 








IV 


12 


6.378 


1 .842 -22.083 


0.001* 


Tumor size (cm) 










<5.0 


35 








25.0 


13 


4,835 


1.386- 16.868 


0.01* 


Depth of Invasion 










TI.T2 


36 








T3.T4 


20 


7.759 


1.024-58.789 


0.04* 


Age (years) 










<50 


38 








250 


28 


1802 


0.988 - 7.943 


0.0526 


Sex 










Mate 


38 








Female 


36 


0.696 


00.274- 1.766 


0.4449 



* p. value < 0.05 (significant) 

HR: Hazard Ratio 

CI: 95% confidence Interval 
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Figure 5 

Kaplan-Meier survival curves for colorectal carcinoma. Overall survival is significantly lower in patients with (a) cyclin A and (b) 
cyclin 0 / overexpression. Patients with high SI for histone H3 mRN A have poorer prognosis but this was not statistically signif- 
icant (c). No significant difference was present between patients with high Ki-67 SI and those with low K/-67 SI (d). 



ate regression analysis revealed that lymph node metasta- 
sis, cyclin A and cyclin Dl overexpression were 
independent negative prognostic factors after adjustment 
for the depth of tumor invasion, age and sex of the patient 
(Table 4). 

Discussion 

The proliferative activity of CRC cells has been investi- 
gated in several studies either by immunohistochemicai 
determination of cell proliferation index using antibodies 
to some types of cyclins or by flowcytometric determina- 
tion of the SPF of the cell cycle [8]. Although Leach et al. 
[17] did not find cyclin Dl gene amplification in a panel 
of 47 CRC cell lines; its protein was overexpressed in 
about 30% of CRC cases that were included in the studies 



of Bartakova et al. [6] and Arber et al. [18|. In the former 
study [6]cydi/i Dl was aberrantly accumulated in a 
significant subset of human CRC cases and the cell lines 
derived from these cases were dependent on cyclin in their 
cell cycle progression. In the second study [18], overex- 
pression of cyclin Dl was detected in 30% of adenomatous 
polyps indicating that overexpression is a relatively early 
event in colon carcinogenesis which is possibly responsi- 
ble for the pathological changes in the mucosa preceding 
neoplastic transformation. More recently, Holland et al. 
[19], Pasz-Waiczak et al. [20] and Utsunomiya et al. [21] 
reported up- regulation of cyclin Dl in 58.7%, 100% and 
43% of their studied cases respectively. 
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Table 4; Multivariate analysis oft he relationship between survival 
and thetested markers 



PredictiveVoriabtes 


HR 


a 


P 


CyclinDl 


10.864 


1.055-86.250 


0.03* 


(baseline < 6.1) 








Cyclin A 


13.686 


1.012- 190379 


0.0490* 


(baseline < 10.5) 








Positive Lymph node 


3.921 


1 .057 - 14.472 


0.0410* 


metastasis 








Stage IV 


3.411 


1.048- 12.083 


0.03* 


Depth of Invasion 








T3.T4 


5.408 


0.449 - 65.080 


0.1836 


Age (yean) 








250 


1.996 


0.678 *- 5.878 


0.2310 


Sex 


0.910 


0.315-2.358 


0.8453 



p. value < 0.05 (significant) 

HR: Hazard Ratio 

CI: 95% confidence Interval 



In the present study, up-regulation of cyclin Dl was 
detected tn 68.3% of the cases. The SI was significantly 
higher in carcinomas than in norma! colorectal mucosa 
and in poorly-differentiated adenocarcinomas it was 
approximately twice that of other histological types. 
Amplification and/or overexpression of cyclin Dl signifi- 
cantly correlated with deeply invasive tumors and positive 
lymph node metastasis. Our results in this regards are con* 
sistent with previous studies [8,22]. In 2001, Holland et 
ai. [19). demonstrated that deregulation of cyclin Dl and 
p2l ^/proteins are important in colorectal tumorigenesis 
and have implications for patient prognosis. Similarly 
McKay et ai. [23] found that cyclin Dl was the only protein 
in their panel {cyclin Dl, p53, pl6, Rb-1, PCNA andp27) 
that correlated with improved outcome in CRC patients. 
However, few studies failed to detect any correlation 
beLween cyclin Dl overexpression and the 
clinicopathological factors in CRC [6,18). This contro- 
versy in results could partially be explained by the differ- 
ence in the sampling of studied cases. The present study 
included 24 cases of poorly differentiated adenocarci- 
noma, which is not common in other studies of CRC in 
western countries. This was possible because the majority 
of CRC cases diagnosed in Egypt are of high histological 
grade (3). The correlation between cyclin Dl overexpres- 
sion and the high histological grade was also reported in 
other tumor types including non-small cell lung 
carcinomas [24] and squamous ceil carcinomas of the lar- 
ynx [16]. Another possible explanation for the observed 
controversy in the results of different studies is the detec- 
tion method used. 

In the present work, overexpression of cyclin Dl was more 
common than gene amplification of the PRAD- If cyclin Dl 



gene with a 63.6% concordance. This was similarly 
reported by Bartakova et al. |6] who mentioned that there 
is a subset of CRC cases in which cyclin Dl is overex- 
pressed without PRAD- 1 /cyclin Dl gene amplification. 
Consistent with this hypothesis are reports of elevated cyc- 
lin Dl mRNA levels and immunohistochemically detecta- 
ble accumulation of the protein in over one third of breast 
cancer cases at a frequency significantly higher than that 
deduced from DNA amplification studies |9,25]. These 
data imply that mechanisms other than gene 
amplification can also lead to deregulation and accumu- 
lation of cyclin Dl in solid tumors. 

So far, several studies were done to reveal the prognostic 
significance of cyclin Dl overexpression in various carci- 
nomas, including CRC [22]. However, these studies 
yielded conflicting results which could be attributed to 
organ heterogeneity. In our study, patients with tumors 
that exhibited cyclin Dl overexpression tended to have 
poor prognosis. 

It was reported that, patients with cyclin A positive carci- 
nomas had significantly shorter median survival times. 
Handa et al. [8] were able to detect cyclin A overexpression 
in 77% of their CRC cases. They also demonstrated that, 
cylcin A could be used as a prognostic factor of CRC. More 
recently, Habermann et al. {26] studied cases of ulcerative 
colitis with and without an associated adenocarcinoma 
for the presence of cyclin A overexpression. They found 
that, cyclin A overexpression was higher in cases of ulcera- 
tive colitis with adenocarcinomas than in those without 
adenocarcinomas. Consequently, they concluded that, 
cyclin A could be used for monitoring ulcerative colitis 
patients and for the early detection of an emerging carci- 
noma in this high risk group of patients. 

In our study, cyclin A was detected in 80% of the patients 
and Cox regression analysis showed that it could be used 
as a prognostic marker in CRC in addition to cyclin Dl . 

It would have been useful if we assessed the expression 
level of cyclin A by another technique (DNA 
amplification). This would have added more information 
regarding the gene status on one hand and confirmed the 
results of IHC on the other hand. Unfortunately, this was 
not possible because in most of the cases included in the 
present work, the extracted DNA was not sufficient to 
study cyclin amplification after the assessment of cyclin Dl . 

In 1996, Nagao et al. [11] reported that histone H3 labe- 
ling index significantly correlated with ki-67 immunos- 
taining and was high in poorly differentiated human 
hepatocellular carcinoma. This was similarly reported in 
the present work since we found a significant correlation 
between the SI of histone H3 and K7-67. However, no 
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statistically significant correlation was found between his- 
tone H3 SI and any of the studied clinicopathological 
factors. 

Although Ki-67 immunostaining reflects the proliferative 
activity of CRC, it has not been recognized as a significant 
prognostic factor in this type of tumors [27,28], However, 
Su2uki at al. [29] found a significant correlation between 
Ki-67 labeling index and local invasion of CRC. In the 
present study there was a significant relationship between 
the SI of Ki-67, tumor size and grade. However, Kaplan- 
Meier survival curves showed no significant difference in 
survival rates between patients with- and without overex- 
pression of Ki-67. 

Conclusions 

Our results demonstrate that cyclin DJ, cyclin A, histone H3 
and Ki-67 are overexpressed in a subset of CRC, however 
only cyclin Dl and cyclin A overexpression correlates with 
poor differentiation and tumor progression. This indi- 
cates the superiority of cyclin A and cyclin Dl as indicators 
of poor prognosis compared to Ki-67 and histone H3 
mRNA in CRC. Cyclin A and Dl could therefore be consid- 
ered significant, independent prognostic factors in CRC 
patients. These findings are especially important in stage 
II patients since 25-30% of those patients have poor prog- 
nosis in spite of being node-negative. However, the stand- 
ard dinicopathologic prognostic factors can not identify 
this subset accurately and therefore; there is a great 
demand for more accurate, individually-based, biological 
prognostic parameters that help in detecting this high risk 
group of patients who can benefit from an adjuvant ther- 
apy. If the findings of the present study are confirmed in a 
larger study, evaluation of cyclin A and Dl may be applica- 
ble to clinical management of CRC, allowing the identifi- 
cation of patients with poor prognosis. 
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